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Most modern financial markets use a continuous double auction mechanism to store and match 
orders and facilitate trading. In this paper we develop a microscopic dynamical statistical model for 
the continuous double auction under the assumption of IID random order flow, and analyze it using 
simulation, dimensional analysis, and theoretical tools based on mean field approximations. The 
model makes testable predictions for basic properties of markets, such as price volatility, the depth 
of stored supply and demand vs. price, the bid-ask spread, the price impact function, and the time 
and probability of filling orders. These predictions are based on properties of order flow and the 
limit order book, such as share volume of market and limit orders, cancellations, typical order size, 
and tick size. Because these quantities can all be measured directly there are no free parameters. 
We show that the order size, which can be cast as a nondimensional granularity parameter, is in 
most cases a more significant determinant of market behavior than tick size. We also provide an 
explanation for the observed highly concave nature of the price impact function. On a broader 
level, this work suggests how stochastic models based on zero-intelligence agents may be useful to 
probe the structure of market institutions. Like the model of perfect rationality, a stochastic-zero 
intelligence model can be used to make strong predictions based on a compact set of assumptions, 
even if these assumptions are not fully believable. 
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I. INTRODUCTION 

This section provides background and motivation, a 
description of the model, and some historical context 
for work in this area. Section || gives an overview of 
the phenomenology of the model, explaining how dimen- 
sional analysis applies in this context, and presenting a 
summary of numerical results. Section III develops an 
analytic treatment of model, explaining some of the nu- 
merical findings of Section Q. We conclude in Section IV 
with a discussion of how the model may be enhanced to 
bring it closer to real-life markets, and some comments 
comparing the approach taken here to standard models 
based on information arrival and valuation. 



A. Motivation 

In this paper we analyze the continuous double auction 
trading mechanism under the assumption of random or- 
der flow, developing a model introduced in Q . This anal- 
ysis produces quantitative predictions about the most 
basic properties of markets, such as volatility, depth of 
stored supply and demand, the bid-ask spread, the price 
impact, and probability and time to fill. These predic- 
tions are based on the rate at which orders flow into the 
market, and other parameters of the market, such as or- 
der size and tick size. The predictions are falsifiable with 
no free parameters. This extends the original random 
walk model of Bachelier by providing a basis for the 
diffusion rate of prices. The model also provides a possi- 
ble explanation for the highly concave nature of the price 
impact function. Even though some of the assumptions 
of the model are too simple to be literally true, the model 
provides a foundation onto which more realistic assump- 
tions may easily be added. 

The model demonstrates the importance of financial 
institutions in setting prices, and how solving a necessary 
economic function such as providing liquidity can have 
unanticipated side-effects. In a world of imperfect ra- 
tionality and imperfect information, the task of demand 
storage necessarily causes persistence. Under perfect ra- 
tionality all traders would instantly update their orders 
with the arrival of each piece of new information, but 
this is clearly not true for real markets. The limit order 
book, which is the queue used for storing unexecuted or- 
ders, has long memory when there are persistent orders. 
It can be regarded as a device for storing supply and de- 
mand, somewhat like a capacitor is a device for storing 
charge. We show that even under completely random IID 
order flow, the price process displays anomalous diffusion 
and interesting temporal structure. The converse is also 
interesting: For prices to be effectively random, incom- 
ing order flow must be non-random, in just the right way 
to com pensa te for the persistence. (See the remarks in 
Section |IVC| .) 

This work is also of interest from a fundamental point 
of view because it suggests an alternative approach to 



doing economics. The assumption of perfect rational- 
ity has been popular in economics because it provides a 
parsimonious model that makes strong predictions. In 
the spirit of Gode and Sunder ||, we show that the 
opposite extreme of zero intelligence random behavior 
provides another reference model that also makes very 
strong predictions. Like perfect rationality, zero intelli- 
gence is an extreme simplification that is obviously not 
literally true. But as we show here, it provides a use- 
ful tool for probing the behavior of financial institutions. 
The resulting model may easily be extended by introduc- 
ing simple boundedly rational behaviors. We also differ 
from standard treatments in that we do not attempt to 
understand the properties of prices from fundamental as- 
sumptions about utility. Rather, we split the problem in 
two. We attempt to understand how prices depend on 
order flow rates, leaving the problem of what determines 
these order flow rates for the future. 

One of our main results concerns the average price 
impact function. The liquidity for executing a market 
order can be characterized by a price impact function 
Ap = cf>(uj,T,t). Ap is the shift in the logarithm of the 
price at time t + r caused by a market order of size uj 
placed at time t. Understanding price impact is impor- 
tant for practical reasons such as minimizing transaction 
costs, and also because it is closely related to an excess 
demand function 1 , providing a natural starting point for 
theories of statistical or dynamical properties of markets 
|jL m]- A naive argument predicts that the price impact 
4>(u>) should increase at least linearly. This argument 
goes as follows: Fractional price changes should not de- 
pend on the scale of price. Suppose buying a single share 
raises the price by a factor k > 1. If k is constant, buying 
lj shares in succession should raise it by fc". Thus, if buy- 
ing lj shares all at once affects the price at least as much 
as buying them one at a time, the ratio of prices before 
and after impact should increase at least exponentially. 
Taking logarithms implies that the price impact as we 
have defined it above should increase at least linearly. 2 

In contrast, from empirical studies <P(lo) for buy orders 
appears to be concave || [?|, ||, [|, |l^, [n]. Lillo et al. 
have shown for that for stocks in the NYSE the concave 
behavior of the price impact is quite consistent across 
different stocks Our model produces concave price 
impact functions that are in qualitative agreement with 
these results. 

Our work also demonstrates the value of physics tech- 
niques for economic problems. Our analysis makes exten- 



1 In financial models it is common to define an excess demand 
function as demand minus supply; when the context is clear the 
modifier "excess" is dropped, so that demand refers to both sup- 
ply and demand. 

2 This has practical implications. It is common practice to break 
up orders in order to reduce losses due to market impact. With 
a sufficiently concave market impact function, in contrast, it is 
cheaper to execute an order all at once. 



sive use of dimensional analysis, the solution of a master 
equation through a generating functional, and a mean 
field approach that is commonly used to analyze non- 
equilibrium reaction-diffusion systems and evaporation- 
deposition problems. 



B. Background: The continuous double auction 

Most modern financial markets operate continuously. 
The mismatch between buyers and sellers that typically 
exists at any given instant is solved via an order-based 
market with two basic kinds of orders. Impatient traders 
submit market orders, which are requests to buy or sell 
a given number of shares immediately at the best avail- 
able price. More patient traders submit limit orders, or 
quotes which also state a limit price, corresponding to 
the worst allowable price for the transaction. (Note that 
the word "quote" can be used either to refer to the limit 
price or to the limit order itself.) Limit orders often fail 
to result in an immediate transaction, and are stored in 
a queue called the limit order book. Buy limit orders 
are called bids, and sell limit orders are called offers or 
asks. We use the logarithmic price a(t) to denote the po- 
sition of the best (lowest) offer and b(t) for the position 
the best (highest) bid. These are also called the inside 
quotes. There is typically a non-zero price gap between 
them, called the spread s(t) = a(t) — b(t). Prices are 
not continuous, but rather have discrete quanta called 
ticks. Throughout this paper, all prices will be expressed 
as logarithms, and to avoid endless repetition, the word 
price will mean the logarithm of the price. The minimum 
interval that prices change on is the tick size dp (also de- 
fined on a logarithmic scale; note this is not true for real 
markets). Note that dp is not necessarily infinitesimal. 

As market orders arrive they are matched against limit 
orders of the opposite sign in order of first price and 
then arrival time, as shown in Fig. |l|. Because orders are 
placed for varying numbers of shares, matching is not 
necessarily one-to-one. For example, suppose the best 
offer is for 200 shares at $60 and the the next best is for 
300 shares at $60.25; a buy market order for 250 shares 
buys 200 shares at $60 and 50 shares at $60.25, moving 
the best offer a{f) from $60 to $60.25. A high density 
of limit orders per price results in high liquidity for mar- 
ket orders, i.e., it decreases the price movement when a 
market order is placed. Let n(p, t) be the stored density 
of limit order volume at price p, which we will call the 
depth profile of the limit order book at any given time 
t. The total stored limit order volume at price level p 
is n{p, f)dp. For unit order size the shift in the best ask 
a(t) produced by a buy market order is given by solving 
the equation 



uj = 



^2 n(p,t)dp 



(1) 



p=o(t) 
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for pi. The shift in the best ask pi — a(t), where is the 



FIG. 1: A schematic illustration of the continuous double 
auction mechanism and our model of it. Limit orders are 
stored in the limit order book. We adopt the arbitrary con- 
vention that buy orders are negative and sell orders are posi- 
tive. As a market order arrives, it has transactions with limit 
orders of the opposite sign, in order of price (first) and time of 
arrival (second). The best quotes at prices a(i) or b(t) move 
whenever an incoming market order has sufficient size to fully 
deplete the stored volume at a(t) or b(t). Our model assumes 
that market order arrival, limit order arrival, and limit order 
cancellation follow a Poisson process. New offers (sell limit 
orders) can be placed at any price greater than the best bid, 
and are shown here as "raining down" on the price axis. Sim- 
ilarly, new bids (buy limit orders) can be placed at any price 
less than the best offer. Bids and offers that fall inside the 
spread become the new best bids and offers. All prices in this 
model are logarithmic. 



instantaneous price impact for buy market orders. A 
similar statement applies for sell market orders, where 
the price impact can be defined in terms of the shift in 
the best bid. (Alternatively, it is also possible to define 
the price impact in terms of the change in the midpoint 
price). 

We will refer to a buy limit order whose limit price 
is greater than the best ask, or a sell limit order whose 
limit price is less than the best bid, as a crossing limit 
order or marketable limit order. Such limit orders result 
in immediate transactions, with at least part of the order 
immediately executed. 



C. The model 

This model introduced in reference [0, is designed to 
be as analytically tractable as possible while capturing 
key features of the continuous double auction. All the 
order flows are modeled as Poisson processes. We as- 
sume that market orders arrive in chunks of a shares, at 
a rate of fj, shares per unit time. The market order may 
be a 'buy' order or a 'sell' order with equal probability. 
(Thus the rate at which buy orders or sell orders arrive 
individually is fx/2.) Limit orders arrive in chunks of a 
shares as well, at a rate a shares per unit price and per 
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unit time for buy orders and also for sell orders. Offers 
are placed with uniform probability at integer multiples 
of a tick size dp in the range of price b(t) < p < oo, and 
similarly for bids on — oo < p < a(t). When a market 
order arrives it causes a transaction; under the assump- 
tion of constant order size, a buy market order removes 
an offer at price a(t), and if it was the last offer at that 
price, moves the best ask up to the next occupied price 
tick. Similarly, a sell market order removes a bid at price 
b(i), and if it is the last bid at that price, moves the best 
bid down to the next occupied price tick. In addition, 
limit orders may also be removed spontaneously by be- 
ing canceled or by expiring, even without a transaction 
having taken place. We model this by letting them be 
removed randomly with constant probability S per unit 
time. 

While the assumption of limit order placement over 
an infinite interval is clearly unrealistic, it provides a 
tractable boundary condition for modeling the behav- 
ior of the limit order book near the midpoint price 
m(t) = (a(t)+b(t))/2, which is the region of interest since 
it is where transactions occur. Limit orders far from the 
midpoint are usually canceled before they are executed 
(we demonstrate this later in Fig. |J), and so far from 
the midpoint, limit order arrival and cancellation have a 
steady state behavior characterized by a simple Poisson 
distribution. Although under the limit order placement 
process the total number of orders placed per unit time 
is infinite, the order placement per unit price interval is 
bounded and thus the assumption of an infinite interval 
creates no problems. Indeed, it guarantees that there are 
always an infinite number of limit orders of both signs 
stored in the book, so that the bid and ask are always 
well-defined and the book never empties. (Under other 
assumptions about limit order placement this is not nec- 
essarily true, as we later demonstrate in Fig. H.) We 
are also considering versions of the model involving more 
realistic order placement functions; see the discussion in 
Section fV% 



In this model, to keep things simple, we are using the 
conceptual simplification of effective market orders and 
effective limit orders. When a crossing limit order is 
placed part of it may be executed immediately. The effect 
of this part on the price is indistinguishable from that of 
a market order of the same size. Similarly, given that 
this market order has been placed, the remaining part is 
equivalent to a non-crossing limit order of the same size. 
Thus a crossing limit order can be modeled as an effec- 
tive market order followed by an effective (non-crossing) 
limit order. 3 Working in terms of effective market and 
limit orders affects data analysis: The effective market 
order arrival rate [i combines both pure market orders 



3 In assigning independently random distributions for the two 
events, our model neglects the correlation between market and 
limit order arrival induced by crossing limit orders. 



and the immediately executed components of crossing 
limit orders, and similarly the limit order arrival rate a 
corresponds only to the components of limit orders that 
are not executed immediately. This is consistent with 
the boundary conditions for the order placement process, 
since an offer with p < b(t) or a bid with p > a(t) would 
result in an immediate transaction, and thus would be ef- 
fectively the same as a market order. Defining the order 
placement process with these boundary conditions real- 
istically allows limit orders to be placed anywhere inside 
the spread. 

Another simplification of this model is the use of loga- 
rithmic prices, both for the order placement process and 
for the tick size dp. This has the important advantage 
that it ensures that prices are always positive. In real 
markets price ticks are linear, and the use of logarithmic 
price ticks is an approximation that makes both the cal- 
culations and the simulation more convenient. We find 
that the limit dp — > 0, where tick size is irrelevant, is 
a good approximation for many purposes. We find that 
tick size is less important than other parameters of the 
problem, which provides some justification for the ap- 
proximation of logarithmic price ticks. 

Assuming a constant probability for cancellation is 
clearly ad hoc, but in simulations we find that other 
assumptions with well-defined timescales, such as con- 
stant duration time, give similar results. For our analytic 
model we use a constant order size a. In simulations we 
also use variable order size, e.g. half-normal distributions 
with standard deviation \J n /2a, which ensures that the 
mean value remains a. As long as these distributions 
have thin tails, the differences do not qualitatively af- 
fect most of the results reported here, except in a triv- 
ial way. As discussed in Section [VB, decay processes 
without well-defined characteristic times and size distri- 
butions with power law tails give qualitatively different 
results and will be treated elsewhere. 

Even though this model is simply defined, the time 
evolution is not trivial. One can think of the dynamics 
as being composed of three parts: (1) the buy market 
order/sell limit order interaction, which determines the 
best ask; (2) the sell market order/buy limit order in- 
teraction, which determines the best bid; and (3) the 
random cancellation process. Processes (1) and (2) de- 
termine each others' boundary conditions. That is, pro- 
cess (1) determines the best ask, which sets the bound- 
ary condition for limit order placement in process (2), 
and process (2) determines the best bid, which deter- 
mines the boundary conditions for limit order placement 
in process (1). Thus processes (1) and (2) are strongly 
coupled. It is this coupling that causes the bid and ask 
to remain close to each other, and guarantees that the 
spread s(t) = a(t) — b(t) is a stationary random variable, 
even though the bid and ask are not. It is the coupling of 
these processes through their boundary conditions that 
provides the nonlinear feedback that makes the price pro- 
cess complex. 
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D. Summary of prior work 

There are two independent lines of prior work, one in 
the financial economics literature, and the other in the 
physics literature. The models in the economics litera- 
ture are directed toward empirical analysis, and treat the 
order process as static. In contrast, the models in the 
physics literature are conceptual toy models, but they 
allow the order process to react to changes in prices, and 
are thus fully dynamic. Our model bridges this gap. This 
is explained in more detail below. 

The first model of this type that we are aware of was 
due to Mendelson E4] , who modeled random order place- 
ment with periodic clearing. This was developed along 
different directions by Cohen et al. [|l3), who used tech- 
niques from queuing theory, but assumed only one price 
level and addressed the issue of time priority at that level 
(motivated by the existence of a specialist who effectively 
pinned prices to make them stationary). Domowitz and 
Wang |l4| and Bollerslev et al. further developed 

this to allow more general order placement processes that 
depend on prices, but without solving the full dynami- 
cal problem. This allows them to get a stationary solu- 
tion for prices. In contrast, in our model the prices that 
emerge make a random walk, and so are much more re- 
alistic. In order to get a solution for the depth of the 
order book we have to go into price coordinates that co- 
move with the random walk. Dealing with the feedback 
between order placement and prices makes the problem 
much more difficult, but it is key for getting reasonable 
results. 

The models in the physics literature incorporate price 
dynamics, but have tended to be conceptual toy models 
designed to understand the anomalous diffusion proper- 
ties of prices. This line of work begins with a paper by 
Bak et al. jl6| which was developed by Eliezer and Kogan 
Jl7t and by Tang jlq] . They assume that limit orders are 
placed at a fixed distance from the midpoint, and that 
the limit prices of these orders are then randomly shuf- 
fled until they result in transactions. It is the random 
shuffling that causes price diffusion. This assumption, 
which we feel is unrealistic, was made to take advantage 
of the analogy to a standard reaction-diffusion model in 
the physics literature. Maslov introduced an alter- 
ative model that was solved analytically in the mean-field 
limit by Slanina ^0|. Each order is randomly chosen to 
be either a buy or a sell, and either a limit order or a mar- 
ket order. If a limit order, it is randomly placed within a 
fixed distance of the current price. This again gives rise to 
anomalous price diffusion. A model allowing limit orders 
with Poisson order cancellation was proposed by Challet 
and Stinchcombe pi] ]. Iori and Chiarella have nu- 
merically studied a model including fundamentalists and 
technical traders. 

The model studied in this paper was introduced by 
Daniels et al. [Q. This adds to the literature by intro- 
ducing a model that treats the feedback between order 
placement and price movement, while having enough re- 



alism so that the parameters can be tested against real 
data. The prior models in the physics literature have 
tended to focus primarily on the anomalous diffusion of 
prices. While interesting and important for refining risk 
calculations, this is a second-order effect. In contrast, 
we focus on the first order effects of primary interest to 
market participants, such as the bid-ask spread, volatil- 
ity, depth profile, price impact, and the probability and 
time to fill an order. We demonstrate how dimensional 
analysis becomes a useful tool in an economic setting, 
and develop mean field theories in a context that is more 
challenging than that of the toy models of previous work. 

Subsequent to reference Q, Bouchaud et al. p| 
demonstrated that, under the assumption that prices exe- 
cute a random walk, by introducing an additional free pa- 
rameter they can derive a simple equation for the depth 
profile. In this paper we show how to do this from first 
principles without introducing a free parameter. 



II. OVERVIEW OF PREDICTIONS OF THE 
MODEL 

In this section we give an overview of the phenomenol- 
ogy of the model. Because this model has five parame- 
ters, understanding all their effects would generally be a 
complicated problem in and of itself. This task is greatly 
simplified by the use of dimensional analysis, which re- 
duces the number of independent parameters from five 
to two. Thus, before we can even review the results, we 
need to first explain how dimensional analysis applies in 
this setting. One of the surprising aspects of this model 
is that one can derive several powerful results using the 
simple technique of dimensional analysis alone. 

Unless otherwise mentioned the results presented in 
this section are based on simulations. These results are 



compared to theoretical predictions in Section III 



A. Dimensional analysis 

Because dimensional analysis is not commonly used 
in economics we first present a brief review. For more 
details see Bridgman p4j . 

Dimensional analysis is a technique that is commonly 
used in physics and engineering to reduce the number 
of independent degrees of freedom by taking advantage 
of the constraints imposed by dimensionality. For suf- 
ficiently constrained problems it can be used to guess 
the answer to a problem without doing a full analysis. 
The idea is to write down all the factors that a given 
phenomenon can depend on, and then find the combi- 
nation that has the correct dimensions. For example, 
consider the problem of the period of a pendulum: The 
period T has dimensions of time. Obvious candidates 
that it might depend on are the mass of the bob m (which 
has units of mass), the length I (which has units of dis- 
tance), and the acceleration of gravity g (which has units 
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Parameter Description Dimensions 



a limit order rate shares/ (price time) 

/i market order rate shares /time 

5 order cancellation rate 1/time 

dp tick size price 

a characteristic order size shares 



TABLE I: The five parameters that characterize this model. 
a, n, and 8 are order flow rates, and dp and a are discreteness 
parameters. 



of distance / time 1 ). There is only one way to combine 
these to produce something with dimensions of time, i.e. 
T ~ yljg. This determines the correct formula for the 
period of a pendulum up to a constant. Note that it 
makes it clear that the period does not depend on the 
mass, a result that is not obvious a priori. We were 
lucky in this problem because there were three param- 
eters and three dimensions, with a unique combination 
of the parameters having the right dimensions; in general 
dimensional analysis can only be used to reduce the num- 
ber of free parameters through the constraints imposed 
by their dimensions. 

For this problem the three fundamental dimensions in 
the model are shares, price, and time. Note that by price, 
we mean the logarithm of price; as long as we are consis- 
tent, this does not create problems with the dimensional 
analysis. There are five parameters: three rate constants 
and two discreteness parameters. The order flow rates 
are /i, the market order arrival rate, with dimensions of 
shares per time; a, the limit order arrival rate per unit 
price, with dimensions of shares per price per time; and 5, 
the rate of limit order decays, with dimensions of 1/time. 
These play a role similar to rate constants in physical 
problems. The two discreteness parameters are the price 
tick size dp, with dimensions of price, and the order size 
a, with dimensions of shares. This is summarized in ta- 
ble | 

Dimensional analysis can be used to reduce the num- 
ber of relevant parameters. Because there are five pa- 
rameters and three dimensions (price, shares, time) , and 
because in this case the dimensionality of the parameters 
is sufficiently rich, the dimensional relationships reduce 
the degrees of freedom, so that all the properties of the 
limit-order book can be described by functions of two pa- 
rameters. It is useful to construct these two parameters 
so that they are nondimcnsional. 

We perform the dimensional reduction of the model 
by guessing that the effect of the order flow rates is pri- 
mary to that of the discreteness parameters. This leads 
us to construct nondimensional units based on the order 
flow parameters alone, and take nondimensionalized ver- 
sions of the discreteness parameters as the independent 
parameters whose effects remain to be understood. As 
we will see, this is justified by the fact that many of the 
properties of the model depend only weakly on the dis- 
creteness parameters. We can thus understand much of 
the richness of the phenomenology of the model through 



Parameter Description Expression 



N c characteristic number of shares [S./28 

p a characteristic price interval [i/2a 

t c characteristic time 1/8 

dp/pc nondimensional tick size 2adpj '/i 

e nondimensional order size 28a / '/i 



TABLE II: Important characteristic scales and nondimen- 
sional quantities. We summarize the characteristic share size, 
price and times defined by the order flow rates, as well as 
the two nondimensional scale parameters dp/p c and e that 
characterize the effect of finite tick size and order size. Di- 
mensional analysis makes it clear that all the properties of the 
limit order book can be characterized in terms of functions of 
these two parameters. 

dimensional analysis alone. 

There are three order flow rates and three fundamen- 
tal dimensions. If we temporarily ignore the discreteness 
parameters, there are unique combinations of the order 
flow rates with units of shares, price, and time. These 
define a characteristic number of shares N c = n/25, a 
characteristic price interval p c = p,/2a, and a character- 
istic timescale t c = 1/5. This is summarized in table y. 
The factors of two occur because we have defined the 
market order rate for either a buy or a sell order to be 
p/2. We can thus express everything in the model in 
nondimcnsional terms by dividing by N c , p c , or t c as ap- 
propriate, e.g. to measure shares in nondimensional units 
N = N/N c , or to measure price in nondimensional units 

P = P/Pc- 

The value of using nondimensional units is illustrated 
in Fig. ||. Fig. ||(a) shows the average depth profile for 
three different values of \i and 5 with the other parame- 
ters held fixed. When we plot these results in dimensional 
units the results look quite different. However, when we 
plot them in terms of nondimensional units, as shown in 
Fig. §(b), the results are indistinguishable. As explained 
below, because we have kept the nondimcnsional order 
size fixed, the collapse is perfect. Thus, the problem of 
understanding the behavior of this model is reduced to 
studying the effect of tick size and order size. 

To understand the effect of tick size and order size it is 
useful to do so in nondimensional terms. The nondimcn- 
sional scale parameter based on tick size is constructed by 
dividing by the characteristic price, i.e. dp/p c — 2adp/ u. 
The theoretical analysis and the simulations show that 
there is a sensible continuum limit as the tick size dp — > 0, 
in the sense that there is non-zero price diffusion and a 
finite spread. Furthermore, the dependence on tick size 
is weak, and for many purposes the limit dp — > approx- 
imates the case of finite tick size fairly well. As we will 
see, working in this limit is essential for getting tractable 
analytic results. 

A nondimcnsional scale parameter based on order size 
is constructed by dividing the typical order size (which 
is measured in shares) by the characteristic number of 
shares N c , i.e. e = o~/N c = 25a / u. e characterizes 
the "chunkiness" of the orders stored in the limit order 
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FIG. 2: The usefulness of nondimensional units, (a) We show 
the average depth profile for three different parameter sets. 
The parameters a = 0.5, a = 1, and dp = are held con- 
stant, while 8 and \i are varied. The line types are: (dotted) 
8 = 0.001, /i = 0.2; (dashed) 8 = 0.002, p = 0.4 and (solid) 
8 = 0.004, p = 0.8. (b) is the same, but plotted in nondimen- 
sional units. The horizontal axis has units of price, and so has 
nondimensional units p — p/pc — 2ap/fi. The vertical axis 
has units of n shares/price, and so has nondimensional units 
n = np c /N c = nS/a. Because we have chosen the parameters 
to keep the nondimensional order size e constant, the collapse 
is perfect. Varying the tick size has little effect on the results 
other than making them discrete. 



book. As we will see, e is an important determinant of 
liquidity, and it is a particularly important determinant 
of volatility. In the continuum limit e — > there is no 
price diffusion. This is because price diffusion can occur 
only if there is a finite probability for price levels out- 
side the spread to be empty, thus allowing the best bid 
or ask to make a persistent shift. If we let e — > while 
the average depth is held fixed the number of individual 
orders becomes infinite, and the probability that spon- 
taneous decays or market orders can create gaps outside 
the spread becomes zero. This is verified in simulations. 
Thus the limit e — > is always a poor approximation to 
a real market, e is a more important parameter than the 
tick size dp/p c . In the mean field analysis in Section III, 



Quantity Dimensions Scaling relation 

Asymptotic depth shares /price d ^ a/8 

Spread price s ~ p/a 

Slope of depth profile shares /price 2 X ~ a 2 / fi8 — d/s 

Price diffusion rate price 2 /time Do ~ p 2 8/a 2 

TABLE III: Estimates from dimensional analysis for the scal- 
ing of a few market properties based on order flow rates alone. 
a is the limit order density rate, \x is the market order rate, 
and 8 is the spontaneous limit order removal rate. These es- 
timates are constructed by taking the combinations of these 
three rates that have the proper units. They neglect the de- 
pendence on on the order granularity e and the nondimen- 
sional tick size dp/p c . More accurate relations from simula- 
tion and theory are given in table [iv| . 



we let dp/p c — > 0, reducing the number of independent 
parameters from two to one, and in many cases find that 
this is a good approximation. 

The order size a can be thought of as the order gran- 
ularity. Just as the properties of a beach with fine sand 
are quite different from that of one populated by fist-sized 
boulders, a market with many small orders behaves quite 
differently from one with a few large orders. N c provides 
the scale against which the order size is measured, and 
e characterizes the granularity in relative terms. Alter- 
natively, 1/e can be thought of as the annihilation rate 
from market orders expressed in units of the size of spon- 
taneous decays. Note that in nondimensional units the 
number of shares can also be written N = N/N c = Ne/a. 

The construction of the nondimensional granularity 
parameter illustrates the importance of including a spon- 
taneous decay process in this model. If 8 = (which im- 
plies e = 0) there is no spontaneous decay of orders, and 
depending on the relative values of /i and a, generically 
either the depth of orders will accumulate without bound 
or the spread will become infinite. As long as S > 0, in 
contrast, this is not a problem. 

For some purposes the effects of varying tick size and 
order size are fairly small, and we can derive approxi- 
mate formulas using dimensional analysis based only on 
the order flow rates. For example, in table III we give 
dimensional scaling formulas for the average spread, the 
market order liquidity (as measured by the average slope 
of the depth profile near the midpoint), the volatility, and 
the asymptotic depth (defined below). Because these es- 
timates neglect the effects of discreteness, they are only 
approximations of the true behavior of the model, which 
do a better job of explaining some properties than oth- 
ers. Our numerical and analytical results show that some 
quantities also depend on the granularity parameter e 
and to a weaker extent on the tick size dp/p c . Nonethe- 
less, the dimensional estimates based on order flow alone 
provide a good starting point for understanding market 
behavior. A comparison to more precise formulas derived 
from theory and simulations is given in table IV. 

An approximate formula for the mean spread can be 
derived by noting that it has dimensions of price, and the 
unique combination of order flow rates with these dimen- 
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Quantity 

Asymptotic depth 
Spread 

Slope of depth profile 
Price diffusion (r — » 0) 
Price diffusion (r — > oo) 



Scaling relation 

d = a/8 

s = (n/a)f(e, dp/p c ) 
X = (a 2 /p8)g(e,dp/p, 
Do = (v 2 5/a 2 )e- - 5 




TABLE IV: The dependence of market properties on model 
parameters based on simulation and theory, with the relevant 
figure numbers. These formulas include corrections for or- 
der granularity e and finite tick size dp/p c . The formula for 



III 



asymptotic depth from dimensional analysis in table 
act with zero tick size. The expression for the mean spread is 
modified by a function of e and dp/p c , though the dependence 
on them is fairly weak. For the liquidity A, corresponding to 
the slope of the depth profile near the origin, the dimensional 
estimate must be modified because the depth profile is no 
longer linear (mainly depending on e) and so the slope de- 
pends on price. The formulas for the volatility are empirical 
estimates from simulations. The dimensional estimate for the 
volatility from Table III is modified by a factor of e~° 



for 

the early time price diffusion rate and a factor of e ' 5 for the 
late time price diffusion rate. 



sions is (i/a. While the dimensions indicate the scaling of 
the spread, they cannot determine multiplicative factors 
of order unity. A more intuitive argument can be made 
by noting that inside the spread removal due to cancella- 
tion is dominated by removal due to market orders. Thus 
the total limit order placement rate inside the spread, for 
either buy or sell limit orders as, must equal the order 
removal rate /i/2, which implies that spread is s = fi/2a. 
As we will see later, this argument can be generalized and 
made more precise within our mean-field analysis which 
then also predicts the observed dependence on the gran- 
ularity parameter e. However this dependence is rather 
weak and only causes a variation of roughly a factor of 
two for e < 1 (see Figs. [lO] and ^4|), and the factor of 1/2 
derived above is a good first approximation. Note that 
this prediction of the mean spread is just the character- 
istic price p c . 

It is also easy to derive the mean asymptotic depth, 
which is the density of shares far away from the mid- 
point. The asymptotic depth is an artificial construct of 
our assumption of order placement over an infinite inter- 
val; it should be regarded as providing a simple boundary 
condition so that we can study the behavior near the mid- 
point price. The mean asymptotic depth has dimensions 
of shares /price, and is therefore given by a/8. Further- 
more, because removal by market orders is insignificant 
in this regime, it is determined by the balance between 
order placement and decay, and far from the midpoint 
the depth at any given price is Poisson distributed. This 
result is exact. 

The average slope of the depth profile near the mid- 
point is an important determinant of liquidity, since it 
affects the expected price response when a market or- 
der arrives. The slope has dimensions of shares / price 2 , 
which implies that in terms of the order flow rates it 



scales roughly as a 2 /fi,5. This is also the ratio of the 
asymptotic depth to the spread. As we will see later, 
this is a good approximation when e ~ 0.01, but for 
smaller values of e the depth profile is not linear near the 
midpoint, and this approximation fails. 

The last two entries in table [E^ are empirical estimates 
for the price diffusion rate D, which is proportional to 
the square of the volatility. That is, for normal diffusion, 
starting from a point at t = 0, the variance v after time 
t is v = Dt. The volatility at any given timescale t is 
the square root of the variance at timescale t. The esti- 
mate for the diffusion rate based on dimensional analysis 
in terms of the order flow rates alone is Li 2 5/a 2 . How- 
ever, simulations show that short time diffusion is much 
faster than long time diffusion, due to negative autocor- 
relations in the price process, as shown in Fig. [ll| The 
initial and the asymptotic diffusion rates appear to obey 
the scaling relationships given in table IV. Though our 



mean-field theory is not able to predict this functional 
form, the fact that early and late time diffusion rates are 
different can be understood within the framework of our 



analysis, as described in Sec. HIE. Anomalous diffusion 
of this type implies negative autocorrelations in midpoint 
prices. Note that we use the term "anomalous diffusion" 
to imply that the diffusion rate is different on short and 
long timescales. We do not use this term in the sense that 
it is normally used in the physics literature, i.e. that the 
long-time diffusion is proportional to t 1 with 7^1 (for 
long times 7 = 1 in our case). 



B. Varying the granularity parameter e 

We first investigate the effect of varying the order gran- 
ularity e in the limit dp — > 0. As we will see, the granu- 
larity has an important effect on most of the properties of 
the model, and particularly on depth, price impact, and 
price diffusion. The behavior can be divided into three 
regimes, roughly as follows: 

• Large e, i.e. e > 0.1. This corresponds to a 
large accumulation of orders at the best bid and 
ask, nearly linear market impact, and roughly equal 
short and long time price diffusion rates. This is the 
regime where the mean-field approximation used in 
the theoretical analysis works best. 

• Medium e i.e. e ~ 0.01. In this range the accu- 
mulation of orders at the best bid and ask is small 
and near the midpoint price the depth profile in- 
creases nearly linearly with price. As a result, as a 
crude approximation the price impact increases as 
roughly the square root of order size. 

• Small e i.e. e < 0.001. The accumulation of orders 
at the best bid and ask is very small, and near the 
midpoint the depth profile is a convex function of 
price. The price impact is very concave. The short 



9 



time price diffusion rate is much greater than the 
long time price diffusion rate. 

Since the results for bids are symmetric with those for 
offers about p = 0, for convenience we only show the 
results for offers, i.e. buy market orders and sell limit 
orders. In this sub-section prices are measured relative 
to the midpoint, and simulations are in the continuum 
limit where the tick size dp — > 0. The results in this 
section are from numerical simulations. Also, bear in 
mind that far from the midpoint the predictions of this 
model are not valid due to the unrealistic assumption 
of an order placement process with an infinite domain. 
Thus the results are potentially relevant to real markets 
only when the price p is at most a few times as large as 
the characteristic price p c . 

1. Depth profile 

The mean depth profile, i.e. the average number of 
shares per price interval, and the mean cumulative depth 
profile are shown in Fig. and the standard deviation of 
the cumulative profile is shown in Fig. ||. Since the depth 
profile has units of shares/price, nondimcnsional units of 
depth profile are h = np c /N c — nS/a. The cumulative 
depth profile at any given time t is defined as 

p 

N(p,t)=Y / n(P,t)dp. (2) 

This has units of shares and so in nondimcnsional terms 
is N(p) = N(p)/N c = 26N(p)/(i = N(p)e/a. 

In the high e regime the annihilation rate due to mar- 
ket orders is low (relative to 5a) , and there is a significant 
accumulation of orders at the best ask, so that the av- 
erage depth is much greater than zero at the midpoint. 
The mean depth profile is a concave function of price. 
In the medium e regime the market order removal rate 
increases, depleting the average depth near the best ask, 
and the profile is nearly linear over the range p/p c < 1- 
In the small e regime the market order removal rate in- 
creases even further, making the average depth near the 
ask very close to zero, and the profile is a convex function 
over the range p/p c < 1- 

The standard deviation of the depth profile is shown 
in Fig. [|. We see that the standard deviation of the 
cumulative depth is comparable to the mean depth, and 
that as e increases, near the midpoint there is a similar 
transition from convex to concave behavior. 

The uniform order placement process seems at first 
glance one of the most unrealistic assumptions of our 
model, leading to depth profiles with a finite asymptotic 
depth (which also implies that there are an infinite num- 
ber of orders in the book). However, orders far away 
from the spread in the asymptotic region almost never 
get executed and thus do not affect the market dynam- 
ics. To demonstrate this in Fig. || we show the compari- 
son between the limit-order depth profile and the depth 



a) 




p'pc 

FIG. 3: The mean depth profile and cumulative depth versus 
p = p/pc = 2ap/fi. The origin p/p c = corresponds to the 
midpoint, (a) is the average depth profile n in nondimensional 
coordinates n = np c /N c = nS/a. (b) is nondimensional cu- 
mulative depth N(p)/N c - We show three different values of 
the nondimensional granularity parameter: e = 0.2 (solid), 
e = 0.02 (dash), e = 0.002 (dot), all with tick size dp = 0. 
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FIG. 4: Standard deviation of the nondimensionalized cu- 
mulative depth versus nondimensional price, corresponding to 
Fig. <§). 
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FIG. 5: A comparison between the depth profiles and the 
effective depth profiles as defined in the text, for different 
values of e. Heavy lines refer to the effective depth profiles n e 
and the light lines correspond to the depth profiles. 
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FIG. 6: The average price impact corresponding to the re- 
sults in Fig. (^[). The average instantaneous movement of the 
nondimcnsional mid-price, {dm)/p c caused by an order of size 
N/N c = Ne/a. e = 0.2 (solid), e = 0.02 (dash), e = 0.002 
(dot). 



n e of only those orders which eventually get executed. 4 
The density n e of executed orders decreases rapidly as a 
function of the distance from the mid-price. Therefore 
we expect that near the midpoint our results should be 
similar to alternative order placement processes, as long 
as they also lead to an exponentially decaying profile of 
executed orders (which is what we observe above) . How- 
ever, to understand the behavior further away from the 
midpoint we are also working on enhancements that in- 
clude more realistic order placement processes grounded 
on empirical measurements of market data, as summa- 



rized in section IV B 



2. Liquidity for market orders: The price impact function 

In this sub-section we study the instantaneous price 
impact function <fi(t,u),T — > 0). This is defined as the 
(logarithm of the) midpoint price shift immediately after 
the arrival of a market order in the absence of any other 
events. This should be distinguished from the asymp- 
totic price impact 4>{f, u),t — > oo), which describes the 
permanent price shift. While the permanent price shift 
is clearly very important, we do not study it here. The 
reader should bear in mind that all prices p, a(t), etc. 
are logarithmic. 

The price impact function provides a measure of the 
liquidity for executing market orders. (The liquidity for 
limit orders, in contrast, is given by the probability of 
execution, studied in section II B 5). At any given time 
t, the instantaneous (r = 0) price impact function is the 



4 Note that the ratio n e /n is not the same as the probability of 
filling orders (Fig. ha) because in that case the price p/p c refers 
to the distance of the order from the midpoint at the time when 
it was placed. 



inverse of the cumulative depth profile. This follows im- 
mediately from equations (Q) and (||) , which in the limit 
dp — > can be replaced by the continuum transaction 
equation: 



to 



N(p,t) 



n(p, t)dp 



(3) 



This equation makes it clear that at any fixed t the price 
impact can be regarded as the inverse of the cumulative 
depth profile N(p,t). When the fluctuations are suffi- 
ciently small we can replace n(p, t) by its mean value 
n(p) — (n(p,t)). In general, however, the fluctuations 
can be large, and the average of the inverse is not equal to 
the inverse of the average. There are corrections based on 
higher order moments of the depth profile, as g iven in the 
moment expansion derived in Appendix Al. Nonethe- 



less, the inverse of the mean cumulative depth provides 
a qualitative approximation that gives insight into the 
behavior of the price impact function. (Note that ev- 
erything becomes much simpler using medians, since the 
median of the cumulative price impact function is ex- 
actly the inverse of the median price impact, as derived 
in Appendix A 1). 

Mean price impact functions are shown in Fig. ^ and 
the standard deviation of the price impact is shown in 
Fig. 0. The price impact exhibits very large fluctuations 
for all values of e: The standard deviation has the same 
order of magnitude as the mean or even greater for small 
Ne/a values. Note that these are actually virtual price 
impact functions. That is, to explore the behavior of the 
instantaneous price impact for a wide range of order sizes, 
we periodically compute the price impact that an order 
of a given size would have caused at that instant, if it had 
been submitted. We have checked that real price impact 
curves are the same, but they require a much longer time 
to accumulate reasonable statistics. 
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FIG. 7: The standard deviation of the instantaneous price 
impact dm/p c corresponding to the means in Fig. [| as a 
function of normalized order size eN/a. e = 0.2 (solid), e = 
0.02 (dash), e = 0.002 (dot). 



FIG. 8: Derivative of the nondimensional mean mid-price 
movement, with respect to logarithm of the nondimensional 
order size N/N c — Ne/a, obtained from the price impact 
curves in Fig. H. 



One of the interesting results in Fig. [| is the scale of 
the price impact. The price impact is measured relative 
to the characteristic price scale p c , which as we have men- 
tioned earlier is roughly equal to the mean spread. As 
we will argue in relation to Fig. ||, the range of nondi- 
mensional shares shown on the horizontal axis spans the 
range of reasonable order sizes. This figure demonstrates 
that throughout this range the price is the order of mag- 
nitude (and typically less than) the mean spread size. 

Due to the accumulation of orders at the ask in the 
large e regime, for small p the mean price impact is 
roughly linear. This follows from equation (^) under 
the assumption that n(p) is constant. In the medium e 
regime, under the assumption that the variance in depth 
can be neglected, the mean price impact should increase 
as roughly uj 1 ! 2 . This follows from equation (JsJ) un- 
der the assumption that n(p) is linearly increasing and 
n(0) w 0. (Note that we see this as a crude approxima- 
tion, but there can be substantial corrections caused by 
the variance of the depth profile). Finally, in the small 
e regime the price impact is highly concave, increasing 
much slower than w 1 / 2 . This follows because n(0) 
and the depth profile n(p) is convex. 

To get a better feel for the functional form of the price 
impact function, in Fig. || we numerically differentiate it 
versus log order size, and plot the result as a function of 
the appropriately scaled order size. (Note that because 
our prices are logarithmic, the vertical axis already incor- 
porates the logarithm) . If we were to fit a local power law 
approximation to the function at each price, this corre- 
sponds to the exponent of that power law near that price. 
Notice that the exponent is almost always less than one, 
so that the price impact is almost always concave. Mak- 
ing the assumption that the effect of the variance of the 
depth is not too large, so that equation (||) is a good as- 
sumption, the behavior of this figure can be understood 
as follows: For N/N c w the price impact is dominated 



by n(0) (the constant term in the average depth profile) 
and so the logarithmic slope of the price impact is always 
near to one. As N/N c increases, the logarithmic slope is 
driven by the shape of the average depth profile, which is 
linear or convex for smaller e, resulting in concave price 
impact. For large values of N/N c , we reach the asymp- 
totic region where the depth profile is flat (and where our 
model is invalid by design). Of course, there can be devi- 
ations to this behavior caused by the fact that the mean 
of the inverse depth profile is not in general the inve rse 
of the mean, i.e. (N^ 1 (p)) ^ {Nip))- 1 (see App. O) . 



To compare to real data, note that N/N c — Ne/a. 
N/a is just the order size in shares in relation to the av- 
erage order size, so by definition it has a typical value of 
one. For the London Stock Exchange, we have found that 
typical values of e are in the range 0.001 — 0.1. For a typ- 
ical range of order sizes from 100 — 100, 000 shares, with 
an average size of 10, 000 shares, the meaningful range for 
N/N c is therefore roughly 10~ 5 to 1. In this range, for 
small values of e the exponent can reach values as low as 
0.2. This offers a possible explanation for the previously 
mysterious concave nature of the price impact function, 
and contradicts the linear increase in price impact based 
on the naive argument presented in the introduction. 



3. Spread 

The probability density of the spread is shown in Fig. 
This shows that the probability density is substantial at 
s/p c = 0. (Remember that this is in the limit dp — > 0). 
The probability density reaches a maximum at a value 
of the spread approximately 0.2p c , and then decays. It 
might seem surprising at first that it decays more slowly 
for large e, where there is a large accumulation of or- 
ders at the ask. However, it should be borne in mind 
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FIG. 9: The probability density function (a), and cumulative 
distribution function (b) of the nondimensionalized bid-ask 
spread s/p c , corresponding to the results in Fig. (^). e = 0.2 
(solid), e = 0.02 (dash), e = 0.002 (dot). 




FIG. 10: The mean value of the spread in nondimensional 
units I = s/p c as a function of e. This demonstrates that the 
spread only depends weakly on e, indicating that the predic- 
tion from dimensional analysis given in table ( |iTj| ) is a reason- 
able approximation. . 



function of e. The mean spread increases monotonically 
with e. It depends on e as roughly a constant (equal to 
approximately 0.45 in nondimensional coordinates) plus 
a linear term whose slope is rather small. We believe 
that for most financial instruments e < 0.3. Thus the 
variation in the spread caused by varying e in the range 
< e < 0.3 is not large, and the dimensional analy- 
sis based only on rate parameters given in table IV is a 
good approximation. 



We get an accurate prediction of 
the e dependence across the full range of e from the In- 
depende nt In terval Approximation technique derived in 
section [II G , as shown in Fig. EJ. 



that the characteristic price p c = fj,/a depends on e. 
Since e = 28a/ fi, by eliminating fi this can be written 
p c = 2<7<5/ (ae). Thus, holding the other parameters fixed, 
large e corresponds to small p c , and vice versa. So in fact, 
the spread is very small for large e, and large for small e, 
as expected. The figure just shows the small corrections 
to the large effects predicted by the dimensional scaling 
relations. 

For large e the probability density of the spread decays 
roughly exponentially moving away from the midpoint. 
This is because for large e the fluctuations around the 
mean depth are roughly independent. Thus the proba- 
bility for a market order to penetrate to a given price 
level is roughly the probability that all the ticks smaller 
than this price level contain no orders, which gives rise 
to an exponential decay. This is no longer true for small 
e. Note that for small e the probability distribution of 
the spread becomes insensitive to e, i.e. the nondimen- 
sionalized distribution for e = 0.02 is nearly the same as 
that for e = 0.002. 

It is apparent from Fig. |^ that in nondimensional units 
the mean spread increases with e. This is confirmed in 
Fig. [To], which displays the mean value of the spread as a 



4- Volatility and price diffusion 

The price diffusion rate, which is proportional to the 
square of the volatility, is important for determining risk 
and is a property of central interest. From dimensional 
analysis in terms of the order flow rates the price dif- 
fusion rate has units of price 2 /time, and so must scale 
as /x 2 5 /a 2 . We can also make a crude argument for this 
as follows: The dimensional estimate of the spread (see 
Table |Tvj ) is \i/2a. Let this be the characteristic step 
size of a random walk, and let the step frequency be the 
characteristic time 1 /S (which is the average lifetime for 
a share to be canceled). This argument also gives the 
above estimate for the diffusion rate. However, this is 
not correct in the presence of negative autocorrelations 
in the step sizes. The numerical results make it clear 
that there are important e-dependent corrections to this 
result, as demonstrated below. 

In Fig. [ll] we plot simulation results for the variance 
of the change in the midpoint price at timescale r, 
Var (m (t + r) — m (t)). The slope is the diffusion rate, 
which at any fixed timescale is proportional to the square 
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FIG. 11: The variance of the change in the nondimension- 
alized midpoint price versus the nondimensional time delay 
interval rS. For a pure random walk this would be a straight 
line whose slope is the diffusion rate, which is proportional 
to the square of the volatility. The fact that the slope is 
steeper for short times comes from the nontrivial temporal 
persistence of the order book. The three cases correspond to 
Fig. |: e = 0.2 (solid), e = 0.02 (dash), e = 0.002 (dot). 



of the volatility. It appears that there are at least two 
timescales involved, with a faster diffusion rate for short 
timescales and a slower diffusion rate for long timescales. 
Such anomalous diffusion is not predicted by mean-field 
analysis. Simulation results show that the diffusion rate 
is correctly described by the product of the estimate 
from dimensional analysis based on order flow parameters 
alone, p 2 S/a 2 , and a r-dependent power of the nondi- 
mensional granularity parameter e — 28a / fj,, as summa- 
rized in table IV. We cannot currently explain why this 
power is —1/2 for short term diffusion and 1/2 for long- 
term diffusion. However, a qualitative understanding can 
be gain ed ba sed on the conservation law we derive in 
Section [II C. A discussion of ho w this relates to price 



diffusion is given in Section [II E 



Note that the temporal structure in the diffusion pro- 
cess also implies non-zero autocorrelations of the mid- 
point price m(t). This corresponds to weak negative au- 
tocorrelations in price differences m(i) — m(t — 1) that 
persist for timescales until the variance vs. r becomes a 
straight line. The timescale depends on parameters, but 
is typically the order of 50 market order arrival times. 
This temporal structure implies that there exists an ar- 
bitrage opportunity which, when exploited, would make 
prices more random and the structure of the order flow 
non-random. 



5. Liquidity for limit orders: Probability and time to fill. 

The liquidity for limit orders depends on the proba- 
bility that they will be filled, and the time to be filled. 
This obviously depends on price: Limit orders close to 



FIG. 12: The probability F for filling a limit order placed at a 
price p/p c where p is calculated from the instantaneous mid- 
price at the time of placement. The three cases correspond 
to Fig. [§ e = 0.2 (solid), e = 0.02 (dash), e = 0.002 (dot). 



the current transaction prices are more likely to be filled 
quickly, while those far away have a lower likelihood to 
be filled. Fig. [IJ plots the probability T of a limit order 
being filled versus the nondimensionalized price at which 
it was placed (as with all the figures in this section, this 
is shown in the midpoint-price centered frame). Fig. |l2| 
shows that in nondimensional coordinates the probability 
of filling close to the bid for sell limit orders (or the ask 
for buy limit orders) decreases as e increases. For large 
e, this is less than 1 even for negative prices. This says 
that even for sell orders that are placed close to the best 
bid there is a significant chance that the offer is deleted 
before being executed. This is not true for smaller values 
of e, where T(0) « 1. Far away from the spread the fill 
probabilities as a function of e are reversed, i.e. the prob- 
ability for filling limit orders increases as e increases. The 
crossover point where the fill probabilities are roughly the 
same occurs at p ~ p c . This is consistent with the depth 
profile in Fig. || which also shows that depth profiles for 
different values of e cross at about p ~ p c . 

Similarly Fig [l3] shows the average time r taken to fill 
an order placed at a distance p from the instantaneous 
mid-price. Again we see that though the average time is 
larger at larger values of e for small p/p c , this behaviour 
reverses at p ~ p c . 



C. Varying tick size dp/p c 

The dependence on discrete tick size dp/p c , of the cu- 
mulative distribution function for the spread, instanta- 
neous price impact, and mid-price diffusion, are shown 
in Fig. |lj. We chose an unrealistically large value of 
the tick size, with dp/p c — 1, to show that, even with 
very coarse ticks, the qualitative changes in behavior are 
typically relatively minor. 

Fig. |l4|(a) shows the cumulative density function of 
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FIG. 13: The average time r nondimensionalized by the 
rate 5, to fill a limit order placed at a distance p/p c from the 
instantaneous mid-price. 



the spread, comparing dp/p c = and dp/p c = 1. It 
is apparent from this figure that the spread distribution 
for coarse ticks "effectively integrates" the distribution 
in the limit dp — > 0. That is, at integer tick values the 
mean cumulative depth profiles roughly match, and in 
between integer tick values, for coarse ticks the probabil- 
ity is smaller. This happens for the obvious reason that 
coarse ticks quantize the possible values of the spread, 
and place a lower limit of one tick on the value the spread 
can take. The shift in the mean spread from this effect 
is not shown, but it is consistent with this result; there 
is a constant offset of roughly 1/2 tick. 

The alteration in the price impact is shown in 
Fig. |lj(b). Unlike the spread distribution, the average 
price impact varies continuously. Even though the tick 
size is quantized, we are averaging over many events and 
the probability of a price impact of each tick size is a 
continuous function of the order size. Large tick size 
consistently lowers the price impact. The price impact 
rises more slowly for small p, but is then similar except 
for a downward translation. 

The effect of coarse ticks is less trivial for mid-price 
diffusion, as shown in Fig. |l4|(c). At e = 0.002, coarse 
ticks remove most of the rapid short-term volatility of 
the midpoint, which in the continuous-price case arises 
from price fluctuations smaller than dp/p c = 1. This 
lessens the negative autocorrelation of midpoint price re- 
turns, and reduces the anomalous diffusion. At e = 0.2, 
where both early volatility and late negative autocorre- 
lation are smaller, coarse ticks have less effect. The net 
result is that the mid-price diffusion becomes less sensi- 
tive to the value of e as tick size increases, and there is 
less anomalous price diffusion. 
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FIG. 14: Dependence of market properties on tick size. 
Heavy lines are dp/vc — > 0; light lines are dp/p c = 1. Cases 
correspond to Fig. pi with e = 0.2 (solid), e = 0.02 (dash), 
e = 0.002 (dot), (a) is the cumulative distribution function for 
the nondimensionalized spread, (b) is instantaneous nondi- 
mensionalized price impact, (c) is diffusion of the nondimen- 
sionalized midpoint shift, corresponding to Fig. |nj. 



III. THEORETICAL ANALYSIS 

A. Summary of analytic methods 

We have investigated this model analytically using two 
approaches. The first one is based on a master equation, 
given in Section III F . This approach works best in the 



midpoint centered frame. Here we attempt to solve di- 
rectly for the average number of shares at each price tick 
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as a function of price. The midpoint price makes a ran- 
dom walk with a nonstationary distribution. Thus the 
key to finding a stationary analytic solution for the aver- 
age depth is to use comoving price coordinates, which are 
centered on a reference point near the center of the book, 
such as the midpoint or the best bid. In the first approx- 
imation, fluctuations about the mean depth at adjacent 
prices are treated as independent. This allows us to re- 
place the distribution over depth profiles with a simpler 
probability density over occupation numbers n at each p 
and t. We can take a continuum limit by letting the tick 
size dp become infinitesimal. With finite order flow rates, 
this gives vanishing probability for the existence of more 
than one order a t any ti ck as dp — > 0. This is described in 
detail in section III F 3 . With this approach we are able 
to test the relevance of correlations as a function of the 
parameter e as well as predict the functional dependence 
of the cumulative distribution of the spread on the depth 
profile. It is seen that correlations are negligible for large 
values of e(e ~ 0.2) while they are very important for 
small values (e - 0.002). 

Our second analytic approach which we term the In- 
dependent Interval Approximation (II A) is most easily 
carried out in th e bid-centered frame and is described 
in section [II G . This approach uses a different repre- 



sentation, in which the solution is expressed in terms of 
the empty intervals between non-empty price ticks. The 
system is characterized at any instant of time by a set 
of intervals {...X-\,xq,xi,X2...} where for example xo is 
the distance between the bid and the ask (the spread), 
X-i is the distance between the second buy limit order 
and the bid and so on (see Fig. |l5|). Equations are 
written for how a given interval varies in time. Changes 
to adjacent intervals are related, giving us an infinite set 
of coupled non-linear equations. However using a mean- 
field approximation we are able to solve the equations, 
albeit only numerically. Besides predicting how the vari- 
ous intervals (for example the spread) vary with the pa- 
rameters, this approach also predicts the depth profiles 
as a function of the parameters. The predictions from the 
IIA are compa red to data from numerical simulations, in 
Section III G 2 . They match very well for large e and less 
well for smaller values of e. The IIA can also be mod- 
ified to incorporate v arious extensions to the model, as 
mentioned in Section III G 2 . 



In both approaches, we use a mean field approxima- 
tion to get a solution. The approximation basically lies 
in assuming that fluctuations in adjacent intervals (which 
might be adjacent price ranges in the master equation ap- 
proach or adjacent empty intervals in the IIA) are inde- 
pendent. Also, both approaches are most easily tractable 
only in the continuum limit dp — > 0, when every tick has 
at most only one order. They may however be extended 
to general tick size as well. This is explained in the ap- 
pendix for the Master Equation approach. 

Because correlations are important for small e, both 
methods work well mostly in the large e limit, though 
qualitative aspects of small e behavior may also be 



gleaned from them. Unfortunately, at least based on 
our preliminary investigation of London Stock Exchange 
data, it seems that it is this small e limit that real markets 
may tend more towards. So our approximate solutions 
may not be as useful as we would like. Nonetheless, they 
do provide some conceptual insights into what determines 
depth and price impact. 

In particular, we find that the shape of the mean depth 
profile depends on a single parameter e, and that the rel- 
ative sizes of its first few derivatives account for both 
the order size-dependence of the market impact, and the 
renormalization of the midpoint diffusivity. A higher rel- 
ative rate of market versus limit orders depletes the cen- 
ter of the book, though less than the classical estimate 
predicts. This leads to more concave impact (explain- 
ing Fig. |^) and faster short-term diffusivity. However, 
the orders pile up more quickly (versus classically nondi- 
mensionalized price) with distance from the midpoint, 
causing the rapid early diffusion to suffer larger mean 
reversion. These are the effects shown in Fig. |ll|. Wc 
will elaborate on the above remarks in the following sec- 
tions, however, the qualitative relation of impact to mid- 
point autocorrelation supplies a potential interpretation 
of data, which may be more robust than details of the 
model assumptions or its quantitative results. 

Both of the treatments described above are approxi- 
mations. We can derive an exact global conservation law 
of order placement a nd re moval whose consequences we 
elaborate in section III C . This conservation law must 



be respected in any sensible analysis of the model, giv- 
ing us a check on the approximations. It also provides 
some insight into the anomalous diffusion properties of 
this model. 



Characterizing limit-order books: dual 
coordinates 



We begin with the assumption of a price space. Price is 
a dimensional quantity, and the space is divided into bins 
of length dp representing the ticks, which may be finite 
or infinitesimal. Prices are then discrete or continuous- 
valued, respectively. 

Statistical properties of interest are computed from 
temporal sequences or ensembles of limit-order book con- 
figurations. If n is the variable used to denote the num- 
ber of shares from limit orders in some bin (p, p + dp) 
at the beginning t of an elementary time interval, a con- 
figuration is specified by a function n(p,t). It is conve- 
nient to take n positive for sell limit orders, and negative 
for buy limit orders. Because the model dynamics pre- 
cludes crossing limit orders, there is in general a high- 
est instantaneous buy limit-order price, called the bid 
b(t), and a lowest sell limit-order price, the ask a(t), 
with b(t) < a (t) always. The midpoint price, defined as 
m (t) = [a (t) + b (t)] /2, may or may not be the price of 
any actual bin, if prices are discrete (m (t) may be a half- 
integer multiple of dp) . These quantities are diagrammed 
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FIG. 15: The price space and order profile, n (p, t) has been 
chosen to be or ±1, a restriction that will be convenient 
later. Price bins are labeled by their lower boundary price, 
and intervals x (N) will be defined below. 



N 



b b+dp 



n(b). 



n(a)i- 



a a+dp 



FIG. 16: The accumulated order number N (p, t). N (a, i) = 
0, because contributions from all bins cancel in the two sums. 
N remains zero down to b (t) + dp, because there are no un- 
canceled, nonzero terms. N (b, t) becomes negative, because 
the second sum in Eq. ([|) now contains n{b,t), not canceled 
by the first. 



in Fig. |l§ 

An equivalent specification of a limit-order book con- 
figuration is given by the cumulative order count 



N(p,t) 



p—dp 



E \n(p,t)\, 



(4) 



where — oo denotes the lower boundary of the price space, 
whose exact value must not affect the results. (Because 
by definition there are no orders between the bid and ask, 
the bid could equivalently have been used as the origin 
of summation. Because price bins will be indexed here 
by their lower boundaries, though, it is convenient here 
to use the ask.) The absolute values have been placed so 
that N, like n, is negative in the range of buy orders and 
positive in the range of sells. The construction of N (p, t) 
is diagrammed in Fig. |l6|. 

In many cases of cither sparse orders or infinitesimal 
dp, with fixed order size (which we may as well define to 
be one share) there will be either zero or one share in any 
single bin, and Eq. (Q) will be invertible to an equivalent 
specification of the limit-order book configuration 



a+dp 
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N 
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b 



FIG. 17: The inverse function p(N,t). The function is in 
general defined only on discrete values of iV, so this domain 
is only invariant when order size is fixed, a convenience that 
will be assumed below. Between the discrete domain, and the 
definition of p as a maximum, the inverse function effectively 
interpolates between vertices of the reflected image of N (p, t), 
as shown by the dotted line. 



shown in Fig. [l7| (Strictly, the inversion may be per- 
formed for any distribution of order sizes, but the re- 
sulting function is intrinsically discrete, so its domain is 
only invariant when order size is fixed. To give p (N, t) 
the convenient properties of a well-defined function on an 
invariant domain, this will be assumed below.) 

With definition (§), p(0,t) = a(t), p(-l,t) = b(t), 
and one can define the intervals between orders as 



x(N,t)=p(N,t)-p(N-l,t). 



(6) 



p (N, t) = max {p | N (p, t) = N} , 



(5) 



Thus x(0,i) = a(t) — b{t), the instantaneous bid- 
ask spread. The lowest values of x (N, t) bracket- 
ing the spread are shown in Fig. [l^. For symmetric 
order-placement rules, probability distributions over con- 
figurations will be symmetric under either n (p, t) — > 
— n (—p, t), or x (N, t) — > x (-N, t). Coordinates N and 
p furnish a dual description of configurations, and n and 
x are their associated diffe rences. The Master Equation 
approach of section [II F assumes independent fluctu- 
ation in n w hile t he Independent Interval Approxima- 
tion of Sec. Ill G assumes independent fluctuation in 
x (In this section, it will be convenient to abbreviate 
x(N,t) = x N (*)). 



C. Frames and marginals 

The x (N, t) specification of limit-order book configu- 
rations has the property that its distribution is station- 
ary under the dynamics considered here. The same is not 
true for p (N, t) or n (p, t) directly, because bid, midpoint, 
and ask prices undergo a random walk, with a renormal- 
ized diffusion coefficient. Stationary distributions for n- 
variables can be obtained in co-moving frames, of which 
there are several natural choices. 
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The bid-centered configuration is defined as 
n b (p, t) = n (p - b (t) , t) . 



(7) 



If an appropriate rounding convention is adopted in the 
case of discrete prices, a midpoint- centered configuration 
can also be defined, as 



(p, t) = n(p— m (t) , t) 



(8) 



The midpoint-centered configuration has qualitative dif- 
ferences from the bid-centered configuration, which will 
be explored below. Both give useful insights to the order 
distribution and diffusion processes. The ask-centered 
configuration, n a (p, t) , need not be considered if order 
placement and removal are symmetric, because it is a 
mirror image of rib (p, t) ■ 

The spread is defined as the difference s(t) = a (t) — 
b (t) , and is the value of the ask in bid-centered coordi- 
nates. In midpoint-centered coordinates, the ask appears 
at s (t) /2. 

The configurations rib and n m are dynamically corre- 
lated over short time intervals, but evolve ergodically in 
periods longer than finite characteristic correlation times. 
Marginal probability distributions for these can therefore 
be computed as time averages, either as functions on the 
whole price space, or at discrete sets of prices. Their 
marginal mean values at a single price p will be denoted 
(n b (p)), (n m (p)), respectively. 

These means are subject to global balance constraints, 
between total order placement and removal in the price 
space. Because all limit orders are placed above the bid, 
the bid-centered configuration obeys a simple balance re- 
lation: 



p 
2 



OO 

£ 

p=b+d 



(a- S(n b {p))) 



(9) 



Eq. (]|) says that buy market orders must account, on av- 
erage, for the difference between all limit orders placed, 
and all decays. After passing to nondimensional coordi- 
nates below, this will imply an inverse relation between 
corrections to the classical estimate for diffusivity at early 
and late times, discussed in Sec. HIE. In addition, this 



conservation law plays an important role in the analysis 
and determination of the x(N, i)'s, as we will see later in 
the text. 

The midpoint-centered averages satisfy a different con- 
straint: 

f = a T + E («-*<nn,(p)». (10) 

p—b+dp 

Market orders in Eq. (|l(]) account not only for the ex- 
cess of limit order placement over evaporation at prices 
above the midpoint, but also the "excess" orders placed 
between b (t) and m (t) . Since these always lead to mid- 
point shifts, they ultimately appear at positive comov- 
ing coordinates, altering the shape of (n m (p)) relative 
to (nj (p)). Their rate of arrival is a (m — b) = a (s) /2. 
These results are also confirmed in simulations. 



D. Factorization tests 

Whether in the bid-centered frame or the midpoint 
centered frame, the probability distribution function for 
the entire configuration n (p) is too difficult a problem 
to solve in its entirety. However, an approximate master 
equation can be formed for n independently at each p if 
all joint probabilities factor into independent marginals, 



P*({»(P<)}i) =Y[Pr(n(pi)) 



(11) 



where Pr denotes, for instance, a probability density for 
n orders in some interval around p. 

Whenever orders are sufficiently sparse that the ex- 
pected number in any price bin is simply the probability 
that the bin is occupied (up to a constant of propor- 
tionality), the independence assumption implies a rela- 
tion between the cumulative distribution for the spread 
of the ask and the mean density profile. In units where 
the order size is one, the relation is 



Pr (s/2 < p) = 1 — exp 




(P')> 



(12) 



This relation is tested against simulation results in 
Fig.[l|. One can observe that there are three regimes. 

A high-e regime is defined when the mean density pro- 
file at the midpoint (n m (0)) < 1, and strongly concave 
downward. In this regime, the approximation of inde- 
pendent fluctuations is excellent, and a master equation 
treatment is expected to be useful. Intermediate-e is de- 
fined by (n m (0)) <C 1 and nearly linear, and the approx- 
imation of independence is marginal. Large-e is defined 
by ( n m (0)) <C 1 and concave upward, and the approxi- 
mation of independent fluctuations is completely invalid. 
These regimes of validity correspond also to the qualita- 



tive ranges noted already in Sec. II B 



In the bid centered frame however, Eq. |l2j never seems 
to be valid for any range of parameters. We will discuss 
later why this might be so. For the present therefore, the 
master equation approach is carried out in the midpoint- 
centered frame. Alternatively, the mean field theory of 
the separations is most convenient in the bid-centered 
frame, so that frame will be studied in the dual basis. 
The relation of results in the two frames, and via the two 
methods of treatment, will provide a good qualitative, 
and for some properties quantitative, understanding of 
the depth profile and its effect on impacts. 

It is possible in a modified treatment, to match cer- 
tain features of simulations at any e, by limited incorpo- 
ration of correlated fluctuations. However, the general 
master equation will be developed independent of these, 
and tested against simulation results at large e, where its 
defining assumptions are well met. 
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a). 





FIG. 18: CDFs Pr(s/2 < p) from simulations (thin solid), 
mean density profile (n m (p)) from simulations (thick solid), 
and computed CDF of spread (thin dashed) from (n m (p)), 
under the assumption of uncorrelated fluctuations, at three 
values of e. (a): e = 0.2 (low market order rate); approxima- 
tion is very good, (b): e = 0.02 (intermediate market order 
rate); approximation is marginal, (c): e = 0.002 (high market 
order rate); approximation is very poor. 



E. Comments on renormalized diffusion 

A qualitative understanding of why the diffusivity is 
different over short and long times scales, as well as why 
it may depend on e, may be gleaned from the following 



observations. 

First, global order conservation places a strong con- 
straint on the classically nondimensionalized density pro- 
file in the bid-centered frame. We have seen that at 
e«l, the density profile becomes concave upward near 
the bid, accounting for an increasing fraction of the al- 
lowed "remainder area" as e — > (see Figs. || and p8| ). 
Since this remainder area is fixed at unity, it can be con- 
served only if the density profile approaches one more 
quickly with increasing price. Low density at low price 
appears to lead to more frequent persistent steps in the 
effective short-term random walk, and hence large short- 
term diffusivity. However, increased density far from the 
bid indicates less impact from market orders relative to 
the relaxation time of the Poisson distribution, and thus 
a lower long-time diffusivity. 

The qualitative behavior of the bid-centered density 
profile is the same as that of the midpoint-centered pro- 
file, and this is expected because the spread distribution 
is stationary, rather than diffusive. In other words, the 
only way the diffusion of the bid or ask can differ from 
that of the midpoint is for the spread to either increase 
or decrease for several succeeding steps. Such autocorre- 
lation of the spread cannot accumulate with time if the 
spread itself is to have a stationary distribution. Thus, 
the shift in the midpoint over some time interval can only 
differ from that of the bid or ask by at most a constant, 
as a result of a few correlated changes in the spread. This 
difference cannot grow with time, however, and so does 
not affect the diffusivity at long times. 

Indeed, both of the predicted corrections to the classi- 
cal estimate for diffusivity are seen in simulation results 
for midpoint diffusion. The simulation results, however, 
show that the implied autocorrelations change the dif- 
fusivity by factors of y/e, suggesting that these correc- 
tions require a more subtle derivation than the one at- 
tempted here. This will be evidenced by the difficulty 
of o btain ing a source term S in density coordinates (sec- 
tion IIIF), which satisfied both the global order conserva- 
tion law, and the proper zero-price boundary condition, 
in the midpoint-centered frame. 

An interesting speculation is that the subtlety of these 
correlations also causes the density n (p, t) in bid-centered 
coordinates not to approximate the mean-field condi- 
tion atany of the parameters studied here, as noted in 
Sec. |iTlDl . Since short-term and long-term diffusivity cor- 
rections are related by a hard constraint, the difficulty 
of producing the late-time density profile should match 
that of producing the early-time profile. The midpoint- 
centered profile is potentially easier, in that the late-time 
complexity must be matched by a combination of the 
early-time density profile and the scaling of the expected 
spread. It appears that the complex scaling is absorbed 
in the spread, as per Fig. [lC] and Fig. 24, leaving a density 



that can be approximately calculated with the methods 
used here. 
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F. Master equations and mean-field 
appr oximat ions 

There are two natural limits in which functional config- 
urations may become simple enough to be tractable prob- 
abilistically, with analytic methods. They correspond to 
mean field theories in which fluctuations of the dual dif- 
ferentials of either N(p,t) or p(N,t) are independent. 
In the first case, probabilities may be defined for any 
density n (p, t) independently at each p, and in the sec- 
ond for the separation intervals x (N, t) at each N. The 
mean field theory from t he first approximation will be 
solved in Subsec. 



Subsec. IIIG 



IIIF1 



and that from the second in 
As mentioned above, because the fluc- 
tuation independence approximation is only usable in a 
midpoint-centered frame, n (p, t) will refer always to this 
frame, x (N, t) is well-defined without reference to any 
frame. 



1. A number density master equation 

If share-number fluctuations are independent at differ- 
ent p, a density 7r (n,p, t) may be defined, which gives the 
probability to find n orders in bin (p,p + dp), at time t. 
The normalization condition defining it as a probability 
density is 



J^tt (n,p,t) = 1, 



(13) 



for each bin index p and at every t. The index t will 
be suppressed henceforth in the notation since we are 
looking for time-independent solutions. 

Supposing an arbitrary density of order-book config- 
urations ir(n,p) at time t, the stochastic dynamics of 
the configurations causes probability to be redistributed 
according to the master equation 



7r (n — a, p) — 7r (n, p)] 



a (p) dp 
a 

H — \(n + a) tt (n + a,p) — rnr (n, p)] 
a 

A* (p) r / s ( \i 

+ -g^- [K(n + (T,p} -ir{n,p)\ 

Ap 

+ ^2 P - ( i n ( n > P + Ap) - tt (n, p)] . (14) 

Ap 

Here dir(n,p)/dt is a continuum notation for 
[tt (n,p, t + St) — 7r (n,p, t)] /St, where St is an ele- 
mentary time step, chosen short enough that at most 
one event alters any typical configuration. Eq. Jl4| ) 
represents a general balance between additions and re- 
movals, without regard to the meaning of n. Thus, a (p) 



is a function that must be determined self-consistently 
with the choice of frame. As an example of how this 
works, in a bid-centered frame, a (p) takes a fixed value 
a (oo) at all p, because the deposition rate is independent 
of position and frame shifts. The midpoint-centered 
frame is more complicated, because depositions below 
the midpoint cause shifts that leave the deposited 
order above the midpoint. The specific consequence for 
a[p) in this case will be considered below. //(p)/2 is, 
similarly, the rate of market orders surviving to cancel 
limit orders at price p. fi (p) /2 decreases from fi (0) /2 
at the ask (for buy market orders, because p, total orders 
are divided evenly between buys and sells) to zero as 
p — > oo, as market orders are screened probabilistically 
by intervening limit orders, a (oo) and /i (0) are thus the 
parameters a and /i of the simulation. 

The lines of Eq. (|l4j) correspond to the following 
events. The term proportional to a(p)dp/a describes 
depositions of discrete orders at that rate (because a is 
expressed in shares per price per time) , which raise con- 
figurations from n — a to n shares at price p. The term 
proportional to S comes from deletions and has the op- 
posite effect, and is proportional to n/a, the number of 
orders that can independently decay. The term propor- 
tional to /i (p) /2a describes market order annihilations. 
For general configurations, the preceding three effects 
may lead to shifts of the origin by arbitrary intervals Ap, 
and P± are for the moment unknown distributions over 
the frequency of those shifts. They must be determined 
self-consistently with the configuration of the book which 
emerges from any solution to Eq. ([hi]). 

A limitation of the simple product representation of 
frame shifts is that it assumes that whole order-book 
configurations are transported under p ± Ap — > p, in- 
dependently of the value of n (p) . As long as fluctuations 
are independent, this is a good approximation for orders 
at all p which are not either the bid or the ask, either 
before or after the event that causes the shift. The cor- 
relations are never ignorable for the bins which are the 
bid and ask, though, and there is some distribution of 
instances in which any p of interest plays those parts. 
Approximate methods to incorporate those correlations 
will require replacing the product form with a sum of 
products conditioned on states of the order book, as will 
be derived below. 



The important point is that the order-flow dependence 
of Eq. ( |l4|) is independent of these self-consistency re- 
quirements, and may be solved by use of generating func- 
tional at general a (p), p, (p), and P±. The solution, ex- 
act but not analytically tractable at general dp, will be 
derived in closed form in the next subsection. It has a 
well-behaved continuum limit at dp — > 0, however, which 
is analytically tractable, so that special case will be con- 
sidered in the following subsection. 
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2. Solution by generating functional 

The moment generating functional for tt is defined for 
a parameter A S [0, 1], as 



U(X,p)= J2 A n/ ^(n,p). 

n/a=0 



(15) 



Introducing a shorthand for its value at A = 0, 

n(0,p)=7r(0,p) = 7r (p), (16) 

while the normalization condition ( |l3| ) for probabilities 
gives 

n(i,p) = i,v P . (17) 

By definition of the average of n (p) in the distribution 
7r, denoted (n (p)), 



d_ 

dX 



n(x,p) 



(n (p)) 



A->1 



and because II will be regular in some sufficiently small 
neighborhood of A = 1, one can expand 

II(A,p) = l + (A-l)-^-Ml+0(A-l) 2 . (19) 
a 

Multiplying Eq. (jlj) by A"/ CT and summing over n, 
(and suppressing the argument p in the notation every- 
where; a (p) or a (0) will be used where the distinction 
of the function from its boundary value is needed) the 
stationary solution for II must satisfy 



= ^-\ad P U-5a^-^-(U-n Q ) 
(J { OA 2A 

+ £ p+ (Ap) p (a, p - Ap) - n (a, p)} 

Ap 

+ ^P_(Ap)[II(A,p + Ap)-II(A,p)]. (20) 

Ap 

Only the symmetric case with no net drift will be con- 
sidered here for simplicity, which requires P+ (Ap) = 
P- (Ap) = P(Ap). In a Fokker-Planck expansion, the 
(unrenormalized) diffusivity of whatever reference price 
is used as coordinate origin, is related to the distribution 
Pby 



P = ^P(Ap)Ap 2 . 

Ap 

The rate at which shift events happen is 

p^]Tp(a p ), 

A /j 



(21) 



(22) 



and the mean shift amount appearing at linear order in 
derivatives (relevant at p — > 0), is 



(Ap) 



E Ap P(Ap)Ap 
Eap^(Ap) ' 



(23) 



Anywhere in the interior of the price range (where p is 
not at any stage the bid, ask, or a point in the spread), 
Eq. (EG) may be written 



d_ D d 2 adp-p/2X \ 
d\ ~ S(X-l) dp 1 5a J 



/' 



28 aX 



7T . 



Evaluated at A — > 1, with the use of the expansion 
this becomes 



1 



D d 
J dp 1 



adp p 
~8~~28~ 



(1 - *o) . 



(25) 



At this point it is convenient to specialize to the case 
dp — > 0, wherein the eligible values of any (n (p)) become 
just a and zero. The expectation is then related to the 
probability of zero occupancy (at each p) as 



(n) = a [1 - 7T ] , 



(18) yielding immediately 



adp 



25 a 



DcP_ 
8 dp 2 



(26) 



(27) 



Eq. ( p7| ) defines the general solution (n (p)) for the 
master equation (14), in the continuum limit 2adp/p — > 
0. The shift distribution P (Ap) appears only through 
the diffusivity D, which must be solved self-consistently, 
along with the otherwise arbitrary functions a and p. 
The more general solution at large dp is carried out in 
App. pi . 



A first step toward nondimensionalization may be 
taken by writing Eq. ( p7[ ) in the form (re-introducing 
the indexing of the functions) 



a(p) 
a (oo) 



M (0) 



+ e 



8 dp 2 



16 In) 



e adp 



(28) 



Far from the midpoint, where only depositions and can- 
cellations take place, orders in bins of width dp are Pois- 
son distributed with mean a (oo) dp/ 8. Thus, the asymp- 
totic value of 8 (n) /a (oo) dp at large p is unity. This is 
consistent with a limit for a(p) /a (oo) of unity, and a 
limit for the screened fi (p) / p (0) of zero. The reason for 
grouping the nondimensionalized number density with 
1/e, together with the proper normalization of the char- 
acteristic price scale, will come from examining the decay 
of the dimensionless function p (p) //i (0). 



3. Screening of the market-order rate 

In the context of independent fluctuations, Eq. (|2^ 
implies a relation between the mean density and the rate 
at which market orders are screened as price increases. 
The effect of a limit order, resident in the price bin p when 
a market order survives to reach that bin, is to prevent 
its arriving at the bin at p + dp. Though the nature of 
the shift induced, when such annihilation occurs, depends 
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on the comoving frame being modeled, the change in the 
number of orders surviving is independent of frame, and 
is given by 



Eq. I 



dfl = -fX (1 - 7T ) 

may be rewritten 



-fi (n) I a. 



dlog(//(p)/M(0))= 1 
dp e 



2aH\ f 6 (n (p)) \ 

M (o) J ^Hdpj 



(29) 



(30) 



identifying the characteristic scale for prices as p c — 
[a (0) /2a (oo) = [i/2a. Writing p = p/p c , the function 
that screens market orders is the same as the argument 
of Eq. (|2S|), and will be denoted 



1 S (n (p)) 
e a (oo) dp 



V>(p) 



(31) 



Defining a nondimcnsionalized diffusivity /3 = D/5p 2 c , 
Eq. ( p7j ) can then be put in the form 



with 



Q (p) 
a (oo) 

m(p) 

A*(0) " 



m(p) 

M (0) 



+ e 1-/3 



# 2 



V (p) = exp 



dpV (P') 



(32) 



(33) 



the nondimensionalized form of Eq. (|lO|). Again, this 
works only if the surface contribution from integrating 
the diffusion term vanishes. 

Neither of these results required the assumption of in- 
dependent fluctuations, though that will be used below 
to give a simple approximate form for Pr(s/2 > p) sa 
tp (p). They therefore provide a check that the extinction 
form (|33| ) propagates market orders correctly into the in- 
terior of the order-book distribution, to respect global 
conservation. They also check the consistency of the in- 
tuitively plausible form for a in the midpoint-centered 
frame. The detailed form is then justified whenever the 
assumption of independent fluctuations is checked to be 
valid. 



5. Self-consistent parametrization 

The assumption of independent fluctuations of n (p) 
used above to derive the screening of market orders, is 
equivalent to a specification of the CDF of the ask. Mar- 
ket orders are only removed between prices p and p + dp 
in those instances when the ask is at p. Therefore 



Pr (J/2 >p) = <p(p). 



(36) 



the continuum limit of Eq. ( |l2"| ) . Together with the form 
a (p) I a (6b) = 1 + Pr (s/2 > p), Eq. @ becomes 



4- Verifying the conservation laws 

Since nothing about the derivation so far has made ex- 
plicit use of the frame in which n is averaged, the combi- 
nation of Eq. (|32"1 ) with Eq. (|33|) respects the conservation 
laws (|^) and (|10|) , if appropriate forms are chosen for the 
deposition rate a (p) . 

For example, in the bid-centered frame, a (p) /a (oo) = 
1 everywhere. Multiplying Eq. ( |32] ) by dp and integrating 
over the whole range from the bid to +oo, we recover the 
nondimensionalized form of Eq. (H) : 



(34) 



iff we are careful with one convention. The integral of 
the diffusion term formally produces the first derivative 
dtp/dp\ . We must regard this as a true first derivative, 
and consider its evaluation at zero continued far enough 
below the bid to capture the identically zero first deriva- 
tive of the sell order depth profile. 

In the midpoint centered frame, the correct form 
for the source term should be a (p) /a (do) = 1 + 
Pr (s/2 > p), whatever the expression for the cumulative 
distribution function. Recognizing that the integral of 
the CDF is, by parts, the mean value of s/2, the same 
integration of Eq. (l32f) gives 



dp (I 



e*) = l-V' 



(35) 



1 + <p = - 



dip 
dp 



+ e[l-0- 



d 2 \ dlogcp 



dp 2 J dp 



(37) 



(If the assumption of independent fluctuations were valid 
in the bid-centered frame, it would take the same form, 
but with <p removed on the left-hand side.) 

To consistently use the diffusion approximation, with 
the realization that for p = 0, ri7r(n,p— Ap) = for 
essentially all Ap in Eq. (|lj), it is necessary to set the 
Fokker-Planck approximation to ip (0 — (Ap)) = as a 
boundary condition. Nondimensionalized, this gives 



d 2 iP 
2 ~df 



= ?(< a ^-')< 



(38) 



where R is the rate at which shifts occur (Eq. g2|). In 
the solutions below, the curvature will typically be much 
smaller than ip (0) ~ 1, so it will be convenient to enforce 
the simpler condition 



(Ap) — 
dp 



V(0)«0, 



(39) 



and verify that it is consistent once solutions have been 
evaluated. 

Self-consistent expressions for (3 and ( Ap) are then con- 
structed as follows. Given an ask at some position a (in 
the midpoint-centered frame), there is a range from —a 
to a in which sell limit orders may be placed, which will 
induce positive midpoint-shifts. The shift amount is half 
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P'PC 



FIG. 19: Fit of the self-consistent solution with diffusivity 
term to simulation results for the midpoint-centered frame. 
Thin solid line is the analytic solution for the mean number 
density, and thick solid line is simulation result, at e = 0.02. 
Thin dashed line is the analytic prediction for the cumulative 
distribution function Pr(s/2 < p), and thick dashed line is 
simulation result. 



as great as the distance from the bid, so the measure 
for shifts dP+ (Ap) from sell limit-order addition inher- 
its a term 2a (0) (dAp) Pr (a > Ap) , where the last fac- 
tor counts the instances with asks large enough to admit 
shifts by Ap. There is an equal contribution to dP_ from 
addition of buy limit orders. Symmetry requires that for 
every positive shift due to an addition, there is a nega- 
tive shift due to evaporation with equal measure, so the 
contribution from buy limit order removal should equal 
that for sell limit order addition. When these contribu- 
tions are summed, the measures for positive and negative 
shifts both equal 



dP± {Ap) = 4a (oo) (dAp) Pr (a > Ap) . 



(40) 



Eq. (|4(j) may be inserted into the continuum limit of 
the definition ( pl| ) for D, and then nondimensionalized 
to give 



dAp(Ap)>(Ap), 



(41) 



where the mean-field substitution of ip (Ap) for 
Pr (a > Ap) has been used. Similarly, the mean shift 
amount used in Eq. (|39|) is 



(Ap) 



r °° dA P (Ap)<p(Ap) 
J °°dApv(Ap). 



(42) 



A fit of Eq. (|37|) to simulations, using these self- 
consistent measures for shifts, is shown in Fig. This 
solution is actually a compromise between approxima- 
tions with opposing ranges of validity. The diffusion 
equation using the mean order depth describes nonzero 
transport of limit orders through the midpoint, an ap- 
proximation inconsistent with the correlations of shifts 



with states of the order book. This approximation is a 
small error only at e — * 0. On the other hand, both the 
form of a, and the self-consistent solutions for (Ap) and 
/3, made use of the mean-field approximation, which we 
saw was only valid for e < 1. The two approximations 
appear to create roughly compensating errors in the in- 
termediate range e ~ 0.02. 



6. Accounting for correlations 

The numerical integral implementing the diffusion so- 
lution actually doesn't satisfy the global conservation 
condition that the diffusion term integrate to zero over 
the whole price range. Thus, it describes diffusive trans- 
port of orders through the midpoint, and as such also 
doesn't have the right p = boundary condition. The 
effective absorbing boundary represented by the pure dif- 
fusion solution corresponds roughly to the approxima- 
tion made by Bouchaud et al. It differs from theirs, 
though, in that their method of images effectively ap- 
proximates the region of the spread as a point, whereas 
Eq. ( |32| ) actually resolves the screening of market orders 
as the spread fluctuates. 

Treating the spread region - roughly defined as the 
range over which market orders are screened - as a point 
is consistent with treating the resulting coarse-grained 
"midpoint" as an absorbing boundary. If the spread is re- 
solved, however, it is not consistent for diffusion to trans- 
port any finite number density through the midpoint, be- 
cause the midpoint is strictly always in the center of an 
open set with no orders, in a continuous price space. The 
correct behavior in a neighborhood of the "fine-grained 
midpoint" can be obtained by explicitly accounting for 
the correlation of the state of orders, with the shifts that 
are produced when market or limit order additions occur. 

We expect the problem of recovering both the global 
conservation law and the correct p = boundary con- 
dition to be difficult, as it should be responsible for the 
non-trivial corrections to short-term and long-term dif- 
fusion mentioned earlier. We have found, however, that 
by explicitly sacrificing the global conservation law, we 
can incorporate the dependence of shifts on the position 
of the ask, in an interesting range around the midpoint. 
At general e, the corrections to diffusion reproduce the 
mean density over the main support of the CDF of the 
spread. While the resulting density does not predict that 
CDF (due to correlated fluctuations), it closely enough 
resembles the real density that the independent CDFs of 
the two are similar. 



7. Generalizing the shift-induced source terms 

Nondimensionalizing the generating-functional master 
equation (20) and keeping leading terms in dp at A — > 1, 
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p/pc P'PC 



FIG. 20: Reconstruction with source terms S that approxi- 
mately account for correlated fluctuations near the midpoint. 
e — 0.2. Thick solid line is averaged order book depth from 
simulations, and thin solid is the mean field result. Thin dot- 
ted line is the simulated CDF for s/2, and thick dotted line 
is the mean field result. Thick dashed line is the CDF that 
would be produced from the simulated depth, if the mean- 
field approximation were exact. 



get 



a (do) 



,m(6) 

dP+ (Ap) [if} (p - Ap) - ^ (p)] 
dP-(Ap)ty(p + Ap)-ip(p)] (43) 



where dP± (Ap) is the nondimensionalized measure that 
results from taking the continuum limit of P± in the vari- 
able Ap. 

Eq. ([43|) is inaccurate because the number of orders 
shifted into or out of a price bin p, at a given spread, 
may be identically zero, rather than the unconditional 
mean value ip. We take that into account by replacing 
the last two lines of Eq. (|4^) with lists of source terms, 
whose forms depend on the position of the ask, weighted 
by the probability density for that ask. Independent fluc- 
tuations are assumed by using Eq. (|3^). 

It is convenient at this point to denote the replacement 
of the last two lines of Eq. ( |43"|) with the notation S, 
yielding 



•■(P) 



a loo) 



m(p) 



if>-S. 



(44) 



The global conservation laws for orders would be satisfied 
if / dpS = 0. 

The source term S is derived approximately in 
App. [Bj. The solution to Eq. Q) at e = 0.2, with 
the simple-diffusive source term replaced by the evalua- 



FIG. 21: Reconstruction with correlated source terms for 
e — 0.02. Line style and thickness are the same as in Fig. bfj. 




tions (B29 - B37), is compared to the simulated order- 



FIG. 22: Reconstruction with correlated source terms for 
e — 0.002. Line style and thickness are the same as in Fig. bfj. 



book depth and spread distribution in Fig. |2Cj. The sim- 
ulated (n (p)) satisfies Eq. ([35|), showing what is the cor- 
rect "remainder area" below the line (n) = 1. The nu- 
merical integral deviates from that value by the incorrect 
integral J dpS =/= 0. However, most of the probability for 
the spread lies within the range where the source terms 
S are approximately correct, and as a result the distri- 
bution for s/2 is predicted fairly well. 

Even where the mean-field approximation is known to 
be inadequate, the source terms defined here capture 
most of the behavior of the order-book distribution in 
the region that affects the spread distribution. Fig. |2l| 
shows the comparison to simulations for e = 0.02, and 
Fig. |22| for e = 0.002. Both cases fail to reproduce the 
distribution for the spread, and also fail to capture the 
large-p behavior of ij). However, they approximate ip at 
small p well enough that the resulting distribution for 
the spread is close to what would be produced by the 
simulated ip if fluctuations were independent. 
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G. A mean-field theory of order separation 
intervals: The Independent Interval Approximation 

A simplifying assumption that is in some sense dual 
to independent fluctuations of n (p), is independent fluc- 
tuations in the intervals x (N) at different N. Here we 
develop a mean-field theory for the order separation in- 
tervals in this model. From this, we will also be able to 
make an estimate of the depth profiles for any value of 
the parameters. For convenience of notation we will use 
x N to denote x (N). 

Limit order placements are considered to take place 
strictly on sites which are not occupied. This is the same 
level of approximation as made in the previous section. 
The time step is normalized to unity, as above, so that 
rates are equal to probabilities after one update of the 
whole configuration. The rates a and \x used in this sec- 
tion correspond to a(oo) and fi(0) as defined earlier. 

As shown in Fig. the configuration is entirely spec- 
ified instant by instant if the instantaneous values of the 
order separation intervals are known. 

Consider now, how these intervals might change due 
to various processes. For the spread so, these processes 
and the corresponding change in so, are listed below. 

1 . xq — > xq + Xi with rate (S + (J./2a) (when the ask 
either evaporates or is deleted by a market order). 

2. xq —>■ xq + s_i with rate [6 + /j/2<t) (when the bid 
either evaporates or is deleted by the corresponding- 
market order). 

3. xq — ► x' for any value 1 < x' < Xq — f, when 
a sell limit order is deposited anywhere in the 
spread. The rate for any single deposition is 
adp/ a, so the cumulative rate for some deposition 
is adp(xo — 1) jo. (The —1 comes from the prohi- 
bition against depositing on occupied sites.) 

4. Similarly xq — > xq — x' for any 1 < x 1 < xq — 1, 
when a buy limit order is deposited in the spread, 
also with cumulative rate adp (so — 1) fa. 

5. Since the above processes describe all possible 
single-event changes to the configuration, the prob- 
ability that it remains unchanged in a single time 
step is f — 2(5 — ^i/er — 2adp (xq — 1 ) /a. 

In all that follows, we will put a = 1 without loss of 
generality. If we know xq, x\, and X-\ at time t, the 
expected value at time t + dt is then 

x (t + dt) = xq (t) [1-26- (Mq- 2a (x - I)] 
+ (s„ + x x ) (s + | ) + (so + -t-i) (s + |) 
+ (a Q dp)x Q (s - I) (45) 

Here, Xi (t) represents the value of the interval averaged 
over many realizations of the process evolved up to time 
t. 



Again representing the finite difference as a time 
derivative, the change in the expected value, given Xq, 
X\, and s_i, is 



dxo 
~dt 



(s a 



(adp)xQ (xq - I) . (46) 



Were it not for the quadratic term arising from depo- 
sition, Eq. ( [f6| ) would be a linear function of sq, Xi, and 
s_i. However we now need an approximation for (sq), 
where the angle brackets represent an average over real- 
izations as before or equivalently a time average in the 
steady state. Let us for the moment assume that we can 
approximate (x§) by a(xo) 2 , where a is some as yet un- 
determined constant to be determined self-consistently. 
We will make this approximation for all the s^'s. This 
is clearly not entirely accurate because the PDF of Xk 
could depend on k (as indeed it does. We will comment 
on this a little later). However as we will see this is still 
a very good approximation. 

We will therefore make this approximation in Eq. ( [46] ) 
and everywhere below, and look for steady state solutions 
when the Sfc's have reached a time independent average 
value. 

It then follows that, 

(6 + fi/2) (xi + s_i) = aadpxQ (x — 1) (47) 

The interval Xk may be though of as the inverse of the 
density at a distance X^=o x j from the bid. That is, 

Xi fts 1/ fX^=o x jdp^, the dual to the mean depth, 
at least at large i. It therefore makes sense to introduce 
a normalized interval 

Xidp 



e-x t dp = 
o 



1 



Pc 



(48) 



the mean- field inverse of the normalized depth ip. In this 
nondimensionalized form, Eq. (IlTn becomes 



(1 + e) (xi + x-i) = ax (x - dp) 



(49) 



where dp = dp/p c . 

Since the depth profile is symmetric about the origin, 
X\ = X—\. From the equations, it can be seen that this 
ansatz is self-consistent and extends to all higher Sj. Sub- 
stituting this in Eq. (^9|) we get 



(1 



e) x\ 



-jXQ (so 



dp) 



(l + e)x-i 



(50) 



Proceeding to the change of Xi, the events that can oc- 
cur, with their probabilities, are shown in Table with 
the remaining probability that x\ remains unchanged. 

The differential equation for the mean change of X\ can 
be derived along previous lines and becomes 



^1 = (25+^ 
dt \ 2 

+ adp 



X2 - (S + ^ ) Si 
Xq (xq - 1) Xi (Si - 1) 



Si (s - 1) 
(51) 
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case 


rate 


range 


Xl — > X2 

Xi — > (Xl + X 2 ) 

xi — > a; 

Xl — » Xl — x' 


(5 + /i/2) 

adp 
adp 


x' G (l,x - 1) 
x' G (l,xi - 1) 



e 


Ooo trom tneory 


Ooo rrom Mtb 


0.66 


1 


1.000 


0.2 


1 


1.000 


0.04 


1 


0.998 


0.02 


1 


1.000 



TABLE V: Events that can change the value of Xi, with their 
rates of occurrence. 



TABLE VI: Theoretical vs. results from simulations for S 



Note that in the above equations, the mean- field approx- 
imation consists of assuming that terms like (xqXi) are 
approximated by the product (x a ) {x\) ■ This is thus an 
'independent interval' approximation. 

Nondimensionalizing Eq. ( pi] ) and combining the result 
with Eq. © gives the stationary value for X2 from xo 
and x%, 

(1 + 2e) x 2 = -xi (xi - dp) + xi (x - dp) . (52) 

Following the same procedure for general k, the nondi- 
mensionalizcd recursion relation is 



fc-2 



(1 + fee) x k = y Xk-i {%k-i - dp) + Xk-i ^2 (£i - dp) . 

(53) 



i=0 



1. Asymptotes and conservation rules 

Far from the bid or ask, ik must go to a constant 
value, which we denote Xoo. In other words, for larg e k, 
Xk+i — > Xk- Taking the difference of Equation ( |53j ) for 
k + 1 and k in this limit gives the identification 



€X C 



(54) 



or aioo = e + dp. Apart from the factor of dp, arising from 
the exclusion of deposition on already-occupied sites, this 
agrees with the limit ip (00) — > 1/e found earlier. In the 
continuum limit dp — > at fixed ejthese are the same. 

From the large-A: limit of Eq. (|p|) , one can also solve 
easily for the quantity = J2i=o (^i ~ &00), which is 
related to the bid -cente red order conservation law men- 
tioned in Section III C . Dividing by a factor of Xoo at 
large k, 

fc-2 

(1 + ke) = I (xoo - dp) + (*< " dp) , (55) 

i=0 



side as Y,i=o(xi - *oo) + J2i=o x 



or, using Eq. (54) and rewriting the sum on the righthand 



fc-2 



1 + = So 



dp 



(56) 



The interpretation of Sqq is straightforward. There are 
k+1 orders in the price range X^j=o Xi - Their decay rate 



is 8 (k + 1), and the rate of annihilation from market or- 
ders is /i/2. The rate of additions, up to an uncertainty 
about what should be considered the center of the in- 
terval, is {otdp)^2 i=0 (xi — 1) in the bid-centered frame 
(where effective a is constant and additions on top of 
previously occupied sites is forbidden) . Equality of addi- 
tion and removal is the bid-centered order conservation 
law (again), in the form 



1) = adp'y^^Xj 

i=0 



!)• 



Taking k large, nondimensionalizing, and using Eq. 
Eq. <pT\ ) becomes 

1 = SoQ. 



(57) 



a. 



(58) 



This conservation law is indeed respected to a remark- 
able accuracy in Monte Carlo simulations of the model 
as indicated in table [vj. 

In order that the equation for the x's obey this exact 
conservation law, we require Eq. |5^ to be equal to Eq. 
|58| . We can hence now self-consistently set the value of 
a = 2. 

The value of a implies that we have now set (x^) ~ 
2 (xfe) 2 . This would be strictly true if the probability dis- 
tribution function of the interval Xk were exponentially 
distributed for all k. This is generally a good approxi- 
mation for large k for any e. Fig ^ shows the numerical 
results from Monte Carlo simulations of the model, for 
the probability distribution function for three intervals 
xo, xi and x§ at e = 0.1. The functional form for P(xq) 
and P(x\) are better approximated by a Gaussian than 
an exponential. However P(x$) is clearly an exponential. 

Eq. ( |5q ) has an important consequence for the short- 
term and long-term diffusivities, which can also be seen in 
simulations, as mentioned in earlier sections. The nondi- 
mensionalization of the diffusivity D with the rate pa- 
rameters, suggests a classical scaling of the diffusivity 



D 



2X M X 



(59) 



As mentioned earlier, it is observed from simulations that 
the locally best short-time fit to the actual diffusivity of 
the midpoint is ~ ^/l/e times the estimate ([39]) , and the 
long-time diffusivity is ~ y/e times the classical estimate. 
While we do not yet know how to derive this relation 
analytically, the fact that early and late-time renormal- 
izations must have this qualitative relation can be argued 
from the conservation law Q. 
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FIG. 23: The probability distribution functions P x (y) vs. y 
for the intervals x = xo,%i and 15 at e = 0.1, on a semi-log 
scale. Solid curve is for xo, dashed for xi, and dot for xs. The 
functional form of the distribution changes from a Gaussian 
to an exponential. 



FIG. 24: The mean value of the spread in nondimensional 
units s = s/p c as a function of e. The numerical value 
above (solid) is compared with the theoretical estimate be- 
low (dash). . 



6*00 is the area enclosed between the actual density and 
the asymptotic value. Increases in 1/e (descaled market- 
order rate) deplete orders near the spread, diminishing 
the mean depth at small p, and induce the upward cur- 
vature seen in Fig. ||, and even more strongly in Fig. |28| 
below. As noted above, they cause more frequent shifts 
(more than compensating for the slight decrease in aver- 
age step size) , and increase the classically descaled diffu- 
sivity (3. However, as a result, this increases the fraction 
of the area in Soo accumulated near the spread, requir- 
ing that the mean depth at larger p increase to compen- 
sate (see Fig. |8|). The resulting steeper approach to 
the asymptotic depth at prices greater than the mean 
spread, and the larger negative curvature of the distribu- 
tion, are fit by an effective diffusivity that decreases with 
increasing 1/e. Since the distribution further from the 
midpoint represents the imprint of market order activity 
further in the past, this effective diffusivity describes the 
long-term evolution of the distribution. The resulting an- 
ticorrelation of the small-p and large-p effective diffusion 
constants implied by conservation of the area Soo is ex- 
actly consistent with their respective ~ ^/1/e and ~ \fe 
scalings. The general idea here is to connect diffusivities 
at short and long time scales to the depth profile near 
the spread and far away from the spread respectively. 
The conservation law for the depth profile, then implies 
a connection between these two diffusivities. 



2. Direct simulation in interval coordinates 

The set of equations determined by the general 
form ( p3| ) is ultimately parametrized by the single in- 
put xq. The correct value for xq is determined when 
the Xk are solved recursively, by requiring convergence 
to Xoo. We do this recursion numerically, in the same 
manner as was done to solve the differential equation for 
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FIG. 25: Four pairs of curves for the quantity Xk/xoo — 1 
vs. k. The value of e increases from top to bottom (e = 
0.02,0.04,0.2,0.66). In each pair of curves, the markers are 
obtained from simulations while the solid curve is the predic- 
tion of Eq. ^ evaluated numerically. The difference between 
numerics and mean-field increases as e decreases, especially 
for large k. 

the normalized mean density if> (p) . 

In Fig. we compare the numerical result for xq with 
the analytical estimate generated as explained above. 
The results are surprisingly good throughout the entire 
range. Though the theoretical value consistently under- 
estimates the numerical value, yet the functional form is 
captured accurately. 

In Fig. H^, the values of x% for all fc, are compared to 
the values determined directly from simulations. 

Fig. ^6| shows the same data on a semilog scale for 
Xk/xoo — 1, showing the exponential decay at large ar- 
gument characteristic of a simple diffusion solution. The 
IIA is clearly a good approximation for large e. How- 
ever for small e it starts deviating significantly from the 
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FIG. 26: Same plot as Fig. |25| but on a semi-log scale to 
show exponential decay at large k. 



simulations, especially for large k. 

The values of Xk computed from the IIA, can be very 
directly used to get an estimation of the price impact. 
The price impact, as defined in earlier sections, can be 
thought of as the change in the position of the mid- 
point (or the bid), consecutive to a certain number of 
orders being filled. Within the framework of the sim- 
plified model we study here, this is simply the quantity 
(Am) = 1/2 53 fe / =1 Xk', for k orders. The factor of 1/2 
comes from considering the change in the position of the 
midpoint and not the bid. Fig 07] shows (Am) nondi- 
mensionalized by p c plotted as a function of the number 
of orders (multiplied by e), for three different values of e. 
Again, the theory matches quite well with the numerics, 
qualitatively. For large e the agreement is quantitative 
as well. 

The simplest approximation to the density profiles in 
the midpoint-centered frame is to continue to approxi- 
mate the mean density as 1/xk, but to regard that den- 
sity as evaluated at position xo/2+jy k=1 Xk- This clearly 
is not an adequate treatment in the range of the spread, 
both because the intervals are discrete, whereas mean 
ip is continuous, and because the density profiles satisfy 
different global conservation laws associated with non- 
constancy of a. For large k however, this approximation 
might hold. The mean-field values (only) corresponding 
to a plot of eip (p) versus p, are shown in Fig. [2^. Here 
the theoretically estimated x^s at different parameter 
values are used to generate the depth profile using the 
procedure detailed above. 

A comparison of the theoretically estimated profiles 
with the results from Monte Carlo simulations of the 
model, is shown in Fig. |2^. As evident, the theoreti- 
cal estimate for the density profile is better for large e 
rather than small e. 

We can also generalize the above analysis to when the 
order placement process is no longer uniform. In partic- 
ular it has been found that a power-law order placement 
process is relevant p3L E6| . We carry out the above analy- 
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FIG. 27: Three pairs of curves for the quantity (Am)/p c 
vs. Ne where (Am) = 1/2 Ylk=i Xk - ^ ne vame 01 6 increases 
from top to bottom (e = 0.002,0.02,0.2). In each pair of 
curves, the markers are obtained from simulations while the 
solid curve is the prediction of the IIA. For e = 0.002, we 
show only the theoretical prediction. The theory captures 
the functional form of the price impact curves for different 
e. Quantitatively, its better for larger epsilon, as remarked 
earlier. 




FIG. 28: Density profiles for different values of e ranging 
over the values 0.2, 0.02, 0.004, 0.001, obtained from the Inde- 
pendent Interval Approximation. 



sis for when a — A /3 /(A + Aq)' 3 where A is the distance 
from the current bid and Ao determines the 'shoulder' of 
the power-law. We find an interesting dependence of the 
existence of solutions on (3. In particular we find that for 
(3 > 1, Ao needs to be larger than some value (which de- 
pends on P as well as other parameters of the model such 
as fi and S) for solutions of the IIA to exist. This might 
be interpreted as a market order wiping out the entire 
book, if the exponent is too large. When solutions exist, 
we find that the the depth profile has a peak , consistent 
with the findings of p3|. In Fig. |3C| the depth profiles 
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FIG. 29: Density profiles from Monte Carlo simula- 
tion (markers) and the Independent Interval Approximation 
(lines). Pluses and dash line are for e = 0.2, while crosses and 
dotted line are for e = 0.02. 
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FIG. 30: Density profiles for a power-law order placement 
process for different values of Ao. 



for three different values of Aq are plotted. 



IV. CONCLUDING REMARKS 

A. Ongoing work on empirical validation 

Members of our group are working on the problem of 
empirically testing this model. We are using a dataset 
from the London Stock Exchange. We have chosen this 
data because it contains every order and every cancel- 
lation. This makes it possible to measure all the pa- 
rameters of this model directly. It is also possible to re- 
construct the order book and measure all the statistical 
properties we have studied in this paper. Our empirical 
work so far shows that, despite its many limitations, our 



model can act as an effective guide to future research. 
We believe that the main discrepancies between the pre- 
dictions of our model and the data can be dealt with 
by using a more sophisticated model of order flow. We 
summarize some of the planned improvements in the fol- 
lowing subsection. 



B. Future Enhancements 

As we have mentioned above, the zero intelligence, IID 
order flow model should be regarded as just a starting 
point from which to add more complex behaviors. We 
are considering several enhancements to the order flow 
process whose effects we intend to discuss in future pa- 
pers. Some of the enhancements include: 

• Trending of order flow. 

We have demonstrated that IID order flow neces- 
sarily leads to non-IID prices. The converse is also 
true: Non-IID order flow is necessary for IID prices. 
In particular, the order flow must contain trends, 
i.e. if order flow has recently been skewed toward 
buying, it is more likely to continue to be skewed 
toward buying. If we assume perfect market effi- 
ciency, in the sense that prices are a random walk, 
this implies that there must be trends in order flow. 

• Power law placement of limit prices 

For both the London Stock Exchange and the Paris 
Bourse, the distribution of the limit price relative 
to the best bid or ask appears to decay as a power- 
law ||^, [2(|. Our investigations of this show that 
this can have an important effect. Exponents larger 
than one result in order books with a finite num- 
ber of orders. In this case, depending on other pa- 
rameters, there is a finite probability that a single 
mar ket ord er can clear the entire book (see Sec- 
tion 
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• Power law or log-normal order size distribution. 

Real order placement processes have order size dis- 
tributions that appear to be roughly like a log- 
normal distribution with a power law tail J27J . This 
has important effects on the fluctuations in liquid- 
ity. 

• Non-Poisson order cancellation process. 

When considered in real time order placement can- 
cellation does not appear to be Poisson (2Lj . How- 
ever, this may not be a bad approximation in event 
time rather than real time. 

• Conditional order placement. 

Agents may conditionally place larger market or- 
ders when the book is deeper, causing the market 
impact function to grow more slowly. We intend 
to measure this effect and incorporate it into our 
model. 
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• Feedback between order flow and prices. 

In reality there are feedbacks between order flow 
and price movements, beyond the feedback in the 
reference point for limit order placement built into 
this model. This can induce bursts of trading, caus- 
ing order flow rates to speed up or slow down, and 
give rise to clustered volatility. 

The last item is just one of many examples of how one 
can surely improve the model by making order flow con- 
ditional on available information. However, we believe it 
is important to first gain an understanding of the prop- 
erties of simple unconditional models, and then build on 
this foundation to get a fuller understanding of the prob- 
lem. 



Comparison to standard models based on 
valuation and information arrival 



In the spirit of Gode and Sunder Q , we assume a sim- 
ple, zero-intelligence model of agent behavior and show 
that the market institution exerts considerable power in 
shaping the properties of prices. While not disputing 
that agent behavior might be important, our model sug- 
gests that, at least on the short timescale many of the 
properties of the market are dictated by the market in- 
stitution, and in particular the need to store supply and 
demand. Our model is stochastic and fully dynamic, and 
makes predictions that go beyond the realm of experi- 
mental economics, giving quantitative predictions about 
fundamental properties of a real market. We have devel- 
oped what were previously conceptual toy models in the 
physics literature into a model with testable explanatory 
power. 

This raises questions about the comparison to standard 
models based on the response of valuations to news. The 
idea that news might drive changes in order flow rates 
is compatible with our model. That is, news can drive 
changes in order flow, which in turn cause the best bid 
or ask price to change. But notice that in our model 
there are no assumptions about valuations. Instead, ev- 
erything depends on order flow rates. For example, the 
diffusion rate of prices increases as the 5/2 power of mar- 
ket order flow rate, and thus volatility, which depends on 
the square root of the diffusion rate, increases as the 5/4 
power. Of course, order flow rates can respond to infor- 
mation; an increase in market order rate indicates added 
impatience, which might be driven by changes in valua- 
tion. But changes in long-term valuation could equally 
well cause an increase in limit order flow rate, which de- 
creases volatility. Valuation per se does not determine 
whether volatility will increase or decrease. Our model 
says that volatility does not depend directly on valua- 
tions, but rather on the urgency with which they are 
felt, and the need for immediacy in responding to them. 

Understanding the shape of the price impact function 
was one of the motivations that originally set this project 



into motion. The price impact function is closely related 
to supply and demand functions, which have been cen- 
tral aspects of economic theory since the 19th century. 
Our model suggests that the shape of price impact func- 
tions in modern markets is significantly influenced not 
so much by strategic thinking as by an economic funda- 
mental: The need to store supply and demand in order 
to provide liquidity. A priori it is surprising that this re- 
quirement alone may be sufficient to dictate at least the 
broad outlines of the price impact curve. 

Our model offers a "divide and conquer" strategy 
to understanding fundamental problems in economics. 
Rather than trying to ground our approach directly on 
assumptions of utility, we break the problem into two 
parts. We provide an understanding of how the statisti- 
cal properties of prices respond to order flow rates, and 
leave the problem open of how order flow rates depend 
on more fundamental assumptions about information and 
utility. Order flow rates have the significant advantage 
that, unlike information, utility, or the cognitive powers 
of an agent, they are directly measurable. We hope that 
by breaking the problem into two pieces, and partially 
solving the second piece, we can ultimately help provide 
a deeper understanding of how markets work. 



APPENDIX A: RELATIONSHIP OF PRICE 
IMPACT TO CUMULATIVE DEPTH 



An important aspect of markets is the immediate liq- 
uidity, by which we mean the immediate response of 
prices to incoming market orders. When a market order 
enters, its execution range depends both on the spread 
and on the depth of the orders in the book. These de- 
termine the sequence of transaction prices produced by 
that order, as well as the instantaneous market impact. 
Long term liquidity depends on the longer term response 
of the limit order book, and is characterized by the price 
impact function 4>(uj, t) for values of t > 0. Immediate 
liquidity affects short term volatility, and long term liq- 
uidity affects volatility measured over longer timescales. 
In this section we address only short term liquid ity. W e 
address volatility on longer timescales in section II B 4 . 

We characterize liquidity in terms of either the depth 
profile or the price impact. The depth profile n(p,t) is 
the number of shares n at price p at time t. For many 
purposes it is convenient to think in terms of the cumu- 
lative depth profile N, which is the sum of n values up 
(or down) to some price. For convenience we establish a 
reference point at the center of the book where we define 
p = and N(Q) = 0). The reference point can be ei- 
ther the midpoint quote, or the best bid or ask. We also 
study the price impact function Ap = 4>{u), r, t), where 
Ap is the shift in price at time t + r caused by an order 
of size uj placed at time t. Typically we define Ap as the 
shift in the midpoint price, though it is also possible to 
use the best bid or ask (Eq. |l|). 

The price impact function and the depth profile are 
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closely related, but the relationship is not trivial. N(Ap) 
gives us the average total number of orders upto a dis- 
tance Ap away from the origin. Whereas, in order to 
calculate the price impact, what we need is the average 
shift Ap caused by a fixed number of orders. Making 
the identifications p — Ap, N — u, and choosing a com- 
mon reference point, the instantaneous price impact is 
the inverse of the instantaneous cumulative depth, i.e. 
4>(oj,0,t) — iV _1 (u;,f). This relationship is clearly true 
instant by instant. However it is not true for averages, 
since the mean of the inverse is not in general equal to the 
inverse of the means, i.e. ((f)) ^ (N) . This is highly rel- 
evant here, since because the fluctuations in these func- 
tions are huge, our interest is primarily in their statistical 
properties, and in particular the first few moments. 

A relationship between the moments can be derived as 
follows: 



1. Moment expansion 

There is some subtlety in how we relate the market 
impact to the cumulative order count N (p, t) . One el- 
igible definition of market impact Ap is the movement 
of the midpoint, following the placement of an order 
of size to. If we define the reference point so that 
N (a, t) = 0, and the market order is a buy, this definition 
puts u) (Ap, t) = N(a + 2Ap, t) - N (a, t). In words, the 
midpoint shift is half the shift in the best offer. An alter- 
native choice would be to let u (Ap, t) = N (Ap, t), which 
would include part of the instantaneous spread in the 
definition of impact in midpoint-centered coordinates, or 
none of it in ask-centered coordinates. The issue of how 
impact is related to N (p, t) is separate from whether the 
best ask is set equal to the reference point for prices, and 
may be chosen differently to answer different questions. 

Under any such definition, however, the impact Ap is a 
monotonic function of u> in every instance, so either may 
be taken as the independent variable, along with the in- 
dex t that labels the instance. We wish to account for the 
differences in instance averages () of u and Ap, regarded 
respectively as the dependent variables, in terms of the 
fluctuations of either. 

In spite of the fact that the density n (p, t) is a highly 
discontinuous variable in general, monotonicity of the cu- 
mulative TV (p, t) enables us to picture a power series ex- 
pansion for u (p, t) in p, with coefficients that fluctuate in 
time. The simplest such expansion that captures much 
of the behavior of the simulated output is 



c(t) 

L o(p,t) = a(t)+b(t)p+-^p 2 , 
if p is regarded as the independent variable, or 



pM= -Ht) + VbHtn2c(t)(.-a(t))^ (A2) 

if oj is. While the variable a (t) would seem unneces- 
sary since ui is zero at p — 0, empirically we find that 



simultaneous fits to both to and oJ 2 at low order can be 
made better by incorporating the additional freedom of 
fluctuations in a. 

We imagine splitting each i-dependent coefficient into 
its mean, and a zero- mean fluctuation component, as 



and 



a (t) = a + Sa (t) 



b(t) = b + Sb (t) 



c(t) = c + 6c (t) . 



(A3) 



(A4) 



(A5) 



The fluctuation components will in general depend on 
e. The values of the mean and second moment of the 
fluctuations can be extracted from the mean distributions 
(ui) and (w 2 )- The mean values come from the linear 
expectation: 



<w(0)> =a, 



d(w(p)) 



and 



dp 



d 2 <w(p)) 



p=0 



p=0 



c. 



(A6) 



(A7) 



(A8) 



Given these, the fluctuations then come from the 
quadratic expectation as 



(lu 2 (0)) - a 2 + (Sa 2 ) 



d(u 2 (p)) 



dp 



= 2ab + 2 (5aSb) 



(A9) 



(A10) 



p=0 



d 2 {u 2 (p)) 



dp 



2 (b 2 + ac) + 2 {5b 2 + SaSc) , (All) 



p=0 



(Al) and 



dp 6 



9 4 (uj 2 (p)> 



p=0 



dp" 



6(bc + (Sb6c)) 



6 (c 2 + (Sc 2 )) . 



(A12) 



(A13) 



p=0 



When ui is given a specific definition in terms of the cu- 
mulative distribution, its averages become averages over 
the density in the order book. 
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The values of the moments as obtained above may then 
be used in a derivative expansion of the inverse func- 
tion ([A4) , making the prediction for the averaged impact 



(P(w)> 



P- 



2 da 2 X 1 2 db 2 X 1 2 dc 2 X 7 



d 2 p 
dadb 



(5a5b) 



2db 2 
d 2 p 
dadc 



(5a5c) 



d 2 p 
dbdc 



(dbdc) , 
(A14) 



where overbar denotes the evaluation of the function ( pV2] ) 
or its indicated derivative at b(t) = b, c(t) = c, and u>. 
The fluctuations (8b 2 } a nd (5 a5c) cannot be determined 
independently from Eq. (All). However, i n kee ping with 
this fact, their coefficient functions in Eq. jAlg) arc iden- 
tical, so the inversion remains fully specified. 
If we denote by Z the radical 



Z 



b 2 + 2c (w - a), 



(A15) 



the various partial derivative functions in Eq. ( A14) eval- 
uate to 



1 d 2 p 
2d7 



d 2 p 
dadb 



2Z 



b 



1 d 2 p 
2~W 



d 2 p 
dbdc 



and 



ld 2 p 
2d7 



d 2 p 
dadc 



b 



to — a 
2c 2 Z 



Z ? < 



(oj — a)b 
cZ 3 ' 



(lo — a) 
2cZ 3 



(A16) 



(A17) 



(A18) 



(A19) 



(A20) 



Plugging these into Eq. (A14) gives the predicted mean 
price impact, compared to actual mean in Fig. |3l]. Here 
the measure used for price impact is the movement of the 
ask from buy market orders. The cumulative order dis- 
tribution is computed in ask-centered coordinates, elim- 
inating the contribution from the half-spread in the p 
coordinate. The inverse of the mean cumulative distri- 



bution (dotted), which corresponds to p in Eq. (A14), 
clearly underestimates the actual mean impact (solid). 
However, the corrections from only second-order fluctua- 
tions in a, 6, and c account for much of the difference at 
all values of e. 
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FIG. 31: Comparison of the inverse mean cumulative order 
distribution p (dot), to the actual mea n im pact (solid), and 
the second-order fluctuation expansion (A14, dash), (a) : e = 
0.2. (b): e = 0.02. (c): e = 0.002. 



2. Quantiles 

Another way to characterize the relationship between 
depth profile and market impact is in terms of their quan- 
tiles (the fraction greater than a given value, for example 
the median is the 0.5 quantile). Interestingly, the rela- 
tionship between quantiles is trivial. Letting Q r (x) be 
the r th quantile of x, because the the cumulative depth 
N(p) is a non-decreasing function with inverse p — <fi(N), 
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we have the relation 



Q r {4>) = (Q\- r (N))~ 



(A21) 



This provides an easy and accurate way to compare depth 
and price impact when the tick size is sufficiently small. 
However, when the tick size is very coarse, the quantiles 
are in general not very useful, because unlike the mean, 
the quantiles do not vary continuously, and only take on 
a few discrete values. 

As we have argued in the previous section, in nondi- 
mensional coordinates all of the properties of the limit 
order book are described by the two dimensionless scale 
factors e and dp/p c (see table ([n]). When expressed in 
dimensionless coordinates, any property, such as depth, 
spread, or price impact, can only depend on these two 
parameters. This reduces the search space from five di- 
mensions to two, which greatly simplifies the analysis. 
Any results can easily be re-expressed in dimensional co- 
ordinates from the definitions of the dimensionless pa- 
rameters. 



APPENDIX B: SUPPORTING CALCULATIONS 
IN DENSITY COORDINATES 

The following two subsections provide details for the 
master equation solution in density coordinates. The first 
provides the generating functional solution for the den- 
sity at general dp, and the second the approximate source 
term for correlated fluctuations. 



1. Generating functional at general bin width 

As in the main text, a and fi represent the functions 
of p everywhere in this subsection, because the boundary 
values do not propagate globally. Eq. (^) can be solved 
by assuming there is a convergent expansion in (formal) 
small D/6, 



(Bl) 



and it is convenient to embellish the shorthand notation 
as well, with 

n i (0,p) = 7r 0j (p). (B2) 
It follows that expected number also expands as 



D 

6 ) {n) r 



(B3) 



Order by order in D/6, Eq. ( p4| ) requires 
d adp — fi/2X\ fi 

IT = — r7T0 



9 Il 3 -i 

dX 6a )" 2 26 aX"" 3 dp 2 (A - 1) ' 

(B4) 



Because IT, have been introduced in order to be chosen 
homogeneous of degree zero in D/6, the normalization 
condition requires that 



n (i,p) = i , n j¥0 (i,p) = o , v P 



(B5) 



The implied recursion relations for expected occupation 
numbers are 



at j = 0, and 



otherwise. 



adp n 
Wo = — - 25 (1 _ ^oo) , 



(B6) 



(B7) 



Eq. (B4) is solved immediately by use of an integrating 
factor, to give the recursive integral relation 



ILj(X) = TT 0j 



1 + ^(A) 
o a 



where 



dp 2 X - 1 



1(A) = A / dze {adp/Sa)(1 - z) z^ /2Sa) , 
Jo 



(B8) 



(B9) 



and 



dp 2 A — 1 / / \ 



X 



i (adp/S *)(!-*) J»/2S a) & n j~l ( Xz ) 

i(x)J dp 2 xz-i ■ 

(BIO) 



The surface condition (B5) provides the starting point 
for this recursion, by giving at j = 



^oo 



1 



1 + (adp /S a) 2(1) 



(Bll) 



Given for ms for a and fi, Eq. flBq ) may be solved directly 
from Eq. (Bll), and extended by Eq. (B7) to solve for 
(n (p)). More generally, equations (B9), (BIO), and (Bll) 
may be solved to any desired order numerically, to obtain 
the fluctuation characteristics of n (p) . Finding the solu- 
tion becomes difficult, however, when a and fj, must be 
related self-consistently to the solutions for II. The spe- 
cial case dp — > admits a drastic simplification, in which 
the whole expansion for (n (p)) may be directly summed, 
to recover the result in the main text. In this limit, one 
gets a single differential equation in p which is solvable 
by numerical integration. The existence and regularity of 
this solution demonstrates the existence of a continuum 
limit on the price space, and can be simulated directly 
by allowing orders to be placed at arbitrary real-valued 
prices. 
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a. Recovering the continuum limit for prices 

In the limit that the dimensionless quantity adp/6 a 
0, Eq. (p9|) simplifies to 



J (A) 



A 



1 + p/2Sa 
from which it follows that 

, . adp/5 
{n) ° ~+ l + p/2Sa " 



O (dp) 



0{dp 2 



(B12) 



(B13) 



The important simplification given by vanishing dp, as 
will be seen below, is that the expansion ( |l9| ) collapses, 
at leading order in dp, to 



n (A)-l + (A-l)^ + o(d P 2 ) 



(B14) 



Eq. ( B14 ) is used as the input to an inductive hypothesis 



n J _ 1 (A)^n 3 _ 1 (i) + (A-i) 



-0(d P 2 ), (B15) 



(n. b. ~ O (dp), Hj-i (1) = either 1 or 0), which 

with Eq. ( |B10 ), then recovers the condition at j: 



n, (a) - (A 



-l)l(l)±_ K -2lzl +(D (dp 2 ). (B16) 



dp 2 a 



Using Eq. © at A ->• 1, and Eq. ( |B12| ) for X, gives the 
recursion for the number density 

The sum (iBSl) for (n) is then 



v (- 1 d2 V 



(B18) 



Usin g E q . (B13) for (n) and re-arranging terms, 
Eq. ( |B18| ) is equivalent to 

1 ^ (D d 2 1 V adp 

(B19) 

The series expansion in the price Laplacian is formally 
the geometric sum 



(l + p/26a) (n) 



D d 2 1 

5 dp 2 l + fi/2Sa 



adp 
(B20) 

which can be inverted to give Eq. (|27|), a relation that is 
local in derivatives. 



case 


source 


prob 


Ap < a' < p 


tp(p- Ap) - i> (p) 


^ (Ap) - ip (p) 


Ap < p < a' < p + Ap 


- ip (p) 


v (p) - tp (p + Ap) 


p < Ap < a' < p + Ap 


Q-i>(j>) 


tp (Ap) -<p(p + Ap) 



TABLE VII: Contributions to "effective P_" from removal 
of a buy limit order, conditioned on the position of the ask 
relative to p. 



case 


source 


prob 


Ap < a' <p 


tp (p + a p ) - i> (p) 


tp (Ap) - ip (p) 


Ap < p < a' < p + Ap 


ip(p + Ap)-0 


ip(p) ~ip(p + Ap) 


p < Ap < a' < p + Ap 


ip (p + Ap) - 


tp (Ap) -<p(p + Ap) 



TABLE VIII: Contributions to "effective P + " from removal 
of a sell limit order, conditioned on the position of the ask 
relative to p. 



2. Cataloging correlations 

A correct source term S must correlate the incidences 
of zero occupation with the events producing shifts. It 
is convenient to separate these into the four independent 
types of deposition and removal. 

First we consider removal of buy limit orders, which 
generates a negative shift of the midpoint. Let a' denote 
the position of the ask after the shift. Then all possible 
shifts Ap are related to a given price b in p and o' in one of 
three ordering cases, shown in Table VII. For each case, 
the source term corresponding to [ip (p — Ap) — ip (p)] in 
Eq. ( |43| ) is given, together with the measure of order-book 
configurations for which that case occurs. The mean-field 
assumption ([56|) is used to estimate these measures. 

As argued when defining (3 in the simpler diffusion ap- 
proximation for the source terms, the measure of shifts 
from removal of either buy or sell limit orders should be 
symmetric with that of their addition within the spread, 
which is is 2dAp for either type, in cases when the shift 
±Ap is consistent with the value of the spread. The only 
change in these more detailed source terms is replace- 
ment of the simple Pr (a > Ap) with the entries in the 
third column of Table VII. When the Ap cases are in- 



tegrated over their range as specified in the first column 
and summed, the result is a contribution to S of 



T 2dAp V (p - Ap) [tp (Ap) - tp (p)] 
Jo 

/>oo 

- / 2dApiP(p)[tp(Ap)-tp(p + Ap)] (B21) 
Jo 



Sell limit-order removals generate another sequence of 
cases, symmetric with the buys, but inducing positive 
shifts. The cases, source terms, and frequencies are given 
in Table VIII. Their contribution to S, after integration 
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case 


source 


prob 


Ap < a < p — Ap 


ip (p - Ap) - ip (p) 


V3 (Ap) - v (p - Ap) 


Ap < p — Ap < a < p 


- i> (p) 


9? (p - Ap) - <^ (p) 


p — Ap < Ap < a < p 


- ip (p) 


ip (Ap) - (p) 



and removal is 



TABLE IX: Contributions to "effective P_" from addition 
of a sell limit order, conditioned on the position of the ask 
relative to p. 



case 


source 


prob 


Ap < a' < p 


tp (p + Ap) - i> (p) 


tp (Ap) - 99 (p) 


Ap < p < a' < p + Ap 


ip (p + Ap) - 


9? (p) - ip (p + Ap) 


p < Ap < a' < p + Ap 


^ (p + Ap) - 


99 (Ap) - p (p + Ap) 



TABLE X: Contributions to "effective P+" from addition 
of a buy limit order, conditioned on the position of the ask 
relative to p. 



rp/2 

/ 2dAp V ip ~ Ap) [93 ( Ap) - ip (p - Ap)] 



2 / 2dAp^(p)b(Ap)-v(p)] 
Jo 

/•oo 

/ 2c?Ap V> (p + Ap) Yp (Ap) - ip (p + Ap)] 
Jo 

(B25) 



The forms ( p24| ) and ( |B25[ ) do not lead to / dpS = 
0, and correcting this presumably requires distributing 
the orders erroneously transported through the midpoint 
by the diffusion term, to interior locations where they 
then influence long-time diffusion autocorrelation. These 
source terms manifestly satisfy S (0) = 0, though, and 
that determines the intercept of the average order depth. 



over Ap, is then 



poo 

/ 2dAp ip (p + Ap) [ip ( Ap) - ip (p + Ap)] 
Jo 

2dAptP(p)[ip(Ap)-ip(p)] (B22) 



Order addition is treated similarly, except that a de- 
notes the position of the ask before the event. Sell limit- 
order additions generate negative shifts, with the cases 



shown in Table IX. Integration over Ap consistent with 



these cases gives the negative-shift contribution to S 



/ 2dAp ip (p + Ap) [ip (Ap) - ip (p - Ap)} 
Jo 



2dAp ip {p) [ip (Ap) -ip(p + Ap)} (B23) 



The corresponding buy limit-order addition cases are 
given in Table |X|, and their positive-shift contribution to 
S turns out to be the same as that from removal of sell 



limit orders (|B22|). 

Writing the source as a sum of two terms S = <Sbuy + 
<S sc ii, the combined contribution from buy limit-order ad- 
ditions and removals is 



Sbuy ip) - / 2dAp [ip (p - Ap) - ip ip)} [ip (Ap) - cp (p)] 
Jo 

2dAp [tp (p + Ap)-ip (p)} [ip (Ap) ~ip(p + Ap)} 

(B24) 

The corresponding source term from sell order addition 



a. Getting the intercept right 

Evaluating Eq. with a (p) /a (6s) = 1 + 

Pr (s/2 > p), at p = gives the boundary value of the 
nondimensionalized, midpoint-centered, mean order den- 
sity 



V; (0) = 



1 



which dimensionalizes to 

(71 (0)) _ 



2a (00) I a 
adp ~ n(0) /2a + 5' 



(B26) 



(B27) 



Eq. ( B27 ), for the total density, is the same as the 
form (|B6) produced by the diffusion solution for the 
zeroth order density, as should be the case if diffusion 
no longer transports orders through the midpoint. This 
form is verified in simulations, with midpoint-centered 
averaging. 

Interestingly, the same argument for the bid-centered 
frame would simply omit the tp from a (0) /a (00), pre- 
dicting that 



^(0) 



1 



1 



(B28) 



a result which is not confirmed in simulations. Thus, in 
addition to not satisfying the mean-field approximation, 
the bid-centered density average appears to receive some 
diffusive transport of orders all the way down to the bid. 



b. Fokker-Planck expanding correlations 



Equations (B24) and ( B25| ) are not directly easy to use 
in a numerical integral. However, they can be Fokker- 
Planck expanded to terms with behavior comparable to 
the diffusion equation, and the correct behavior near the 
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midpoint. Doing so gives the nondimensional expansion 
of the source term S corresponding to the diffusion con- 
tribution in Eq. @: 

S = Km(p)+V(P) d ^ + eP®^. (B29) 



dp 



dp 



The rate terms in Eq. (|B29[ ) are integrals defined as 

/>oo 

flip) = / 2dAp [<p (Ap) - <p (p + Ap)] 
Jo 

- 2 / P 2dAp (Ap) - ip (p)] 
Jo 

/■p/2 

+ / 2dAp fo> (Ap) - ip (p - Ap)] (B30) 
Jo 



V(p) 



and 

e/3(p) 



/•OO 

2 / 2dAp Ap [<p ( Ap) - <y3 (p + Ap)] 
Jo 



2dApp [<p (Ap) - ^ (p)] 



- / P 2dApAp[(^(Ap)-^(p- Ap)©31) 
Jo 

/>oo 

/ 2dAp(Ap) 2 [^(Ap)-< y3 (p + Ap)] 
Jo 

+ (Ap) 2 2dAp[^(Ap)-^(p)] 

i r* /a ? 

+ -J 2dAp (Ap) 2 [<p (Ap) - ip (p - Ap)] . 

(B32) 



All of the coefficients ( B30 - B3f ) vanish mani- 
festly as p — > 0, and at large p, 71, V — ► 0, while 
e/3 (p) — > 4 f n dAp (Ap) y> (Ap), recovering the diffusion 
constant (Ell) of the simplified source term. However, 
they are still not convenient for numerical integration, 
being nonlocal in ip. 

The exponential form ( |33"| ) is therefore exploited to ap- 
proximate (p, in the region where its value is largest, with 



the expansion 

ip (p ± Ap) m tp (p) <p (Ap) e ±p Apav/ap| ( B33 ) 

In the range where the mean-field approximation is valid, 
ip is dominated by the constant term tp (0), and even the 
factors A P d ^/ d P\o can be approximated as unity. This 
leaves the much-simplified expansions 

n (p) = [f-^(p-)]2oM 

- 2 [X (p) - 2p^ (p)] 

+ [f-^(p)]X (p/2), (B34) 



and 



V(p) = 2[l-< /3 (p)]T 1 (oo) 

- [1 1 (P)-P 2 V(P)] 
-{l-i P (p)}I 1 (p/2). 

e/3(p) = [l-^(p)]X 2 (oo) 



(B35) 



Zr o 

^2 (p) ~ gP V 3 (P) 

-^(p)]X 2 (p/2). 



In Equations (|B3j - |B3^ ), 

l 3 (p)= f P 2dAp(Apy^(Ap), 
Jo 



(B36) 



(B37) 



for j = , 1,2. These forms (B34 - B36) are inserted in 
Eq. (B29) for S to produce the mean-field results com- 
pared to simulations in Fig. |2(1 - Fig. p2[ 
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